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Coaunaing  studies  of  two  separate  mouon-computauon  systems  in  human  vision  and  the  derivation  of  the  functional 
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evidence  that  smivturc  from  motion  depends  pnmanly  on  first-order  mouon  computation,  and  »r  demonstrate  restricted 
abilities  ot  the  second-order  system.  v3 >  A  potent  form  of  spatial  contrast-gain-contiol  was  discov  cred  and  found  to  be  not 
only  lrcquen.,y  sciecuve  but  also  oncmation  specific.  This  form  of  local  gam  control  may  eiemplify  a  universal  form  of 
neural  nonT*alizaiion.  Studies  of  human  pattern  recognition  of  familiar  shapes  (such  as  letters)  show  that  its  statistical 

etliciency  approaches  an  incredible  50%  of  the  ideal  detecto.  s  efficiency  when  the  palicm  is  spatially  bandpass  filtered  in 
a  band  whose  wavelength  is  of  the  same  order  as  the  patlcni  itself  Ondependent  of  the  sire  of  the  retinal  image)  (5) 
Studies  ol  real  and  sunulated  saccadic  eye  movements  (m  which  the  same  sequence  of  images  that  is  produced  on  the 
retina  dunng  saccadic  eye  movements  is  aruficially  produced  on  a  stationary  retina.)  answer  the  following  quesuons 
about  human  visual  percepuon.  yy  Why  don't  we  sec  the  smear  produced  on  the  retina  during  an  eye  movement''  (ii) 
Why  docsn  t  the  world  appear  to  move  as  a  result  of  the  image  movements  produced  by  eye  movements'’  (iii)  Docs  the 
visual  system  require  sudden  sumulus  onsets  tsuch  as  those  produced  by  eye  movvmcnls)  to  initiate  processing  episodes 
yvy  To  serve  the  perceptual  construction  of  a  stable  rcjn-escntalion  of  the  world,  is  there  a  special  memory  to  relate 
images  produced  by  successive  eye  movements? 
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Abstract 


A  theoretic^  foundation  and  concicte  stimulus-construction  methods  ate  provided  for  smd^g 
motion-lrom-^iaual-textuie  without  cootaminauon  by  motion  mechanisms  sensitive  to  other  aspects 
of  the  signal.  Specifically,  examples  are  constructed  of  a  special  class  of  random  stimuli  called  tex¬ 
ture  quilts.  Although,  as  we  demonstrate  experimentally,  certain  texnire  quilts  display  consistent 
apparent  motion,  it  is  proven  that  thw  motion  content  (a)  is  unavailable  to  standard  motion  analysis 
(such  as  might  be  accomplished  by  an  Adelson/Bergen  motion-energy  analyzer,  a  WatsoiVAhumada 
motion  sensor,  or  by  any  elaborated  Reichardl  detector),  and  (b)  cannot  be  exposed  to  standard 
motion  analysis  by  any  purely  temporal  signal  transformation  no  matter  how  nonlinear  (e  g.,  temporal 
differentiation  followed  by  rectification).  Applying  such  a  purely  temporal  transformation  to  any  tex¬ 
ture  quilt  produces  a  spatiotemporal  function  P  whose  motion  is  unavailable  to  standard  motion 
analysis;  The  expected  response  of  every  Reichardt  detector  to  ?  is  0  at  every  instant  in  time.  The 
simplest  mechanism  sufficient  to  sense  the  motion  exhibited  by  texture  qmits  consists  of  three  succes¬ 
sive  stages:  (i)  a  purely  spatial  Imear  filter  (ti)  a  recufier  (but  not  a  perfect  square  law)  to  transform 
regions  of  large  negauve  or  posiuve  responses  into  regions  of  high  positive  values,  and  (tii)  standard 
mouon  analysis. 
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1.  Introduction. 

Siatidard  motion  analysis.  The  extenave  literature  on  the  motion  of  randomKlot  cinemato- 
giams  (Anstis,  1970;  Julesz.  1971;  Braddick.  1973. 1974;  L^in  &  BeU,  1976;  Bell  &  Lappin.  1979; 
Baker  &  Braddick.  1982a,  1932b;  Chang  &  Julesz,  1983a.  1983b.  198S;  Ramachandtan  &  Anstis, 
1983:  Nakayama  &  Silverman,  1984;  van  Doom  &  Koenderink,  1984)  points  toward  the  view  that  a 
"shoit-range"  system  (Braddick,  1973,  1974)  submits  the  raw  spadolemporal  luminance  fimcdon 
directly  to  standard  motion  analysis  (such  as  might  be  accomplished  by  an  Adelson/Bcrgen  motion- 
energy  detector  (Adelson  &  Bergen,  1985),  a  Watson/Ahumada  motion  sensor  (Watson  &  Ahumada, 
1983a,  1983b,  198S),  an  elaborated  Reichardt  detector  (van  Santen  &  Sperling,  1984, 198S),  or  some 
vanants  of  a  gradient  detector  (Marr  &  UUman.  1981:  Adelson  &  Bergen.  1986)). 

Fourier  and  nonFourter  mechanisms.  An  impressive  number  of  observations  suggests  that 
standard  motion  analysis  is  not  the  whole  story  (Sperling,  1976;  Ramachandran,  Rao  &  Vidyasagar, 
1973;  Peietsik,  Hicks  &  Panile,  1978:  Ramachandran,  Gmsburg  &  Anstis,  1983:  Lelkins  &  Koender¬ 
ink.  1984;  Derrington  &  Badcock,  1985;  Green.  1986;  Panile  &  Turano,  1986;  Dentington  &  Hen¬ 
ning,  1987;  Turano  &  Panile,  1988;  Bowne,  McKee  &  Glaser,  1989;  Cavanagh,  Arguin  &  von 
Grunau,  1989).  In  particular,  Chubb  and  Sperlmg  (1987,  1988)  have  demonsuated  a  variety  of 
sumuli  that  display  consistent,  unambiguous  apparent  mouon,  yet  that  do  not  systematically  stimulate 
mechanisms  that  apply  standard  motion  analysis  duectly  to  luminance.  For  reasons  that  will  become 
clear  in  Section  2,  we  call  any  mouon  system  that  applies  standard  analysis  to  the  raw  signal  as  a 
Fourier  mechanism,  and  we  refer  to  any  system  that  apphes  standard  analysis  to  a  nonlinear  transfor- 
mauon  of  the  signal  as  a  nonFourier  mechanism. 

Microbalanced  stimuli.  The  methods  used  by  Chubb  &  Sperling  to  consmict  stimuli  whose 
obvious  and  consistent  mouon  content  cannot  be  revealed  ^y  applying  standard  mouon  analysis 
directly  to  luminance  are  founded  on  the  nouon  of  a  microbalanced  random  stimulus.  In  Section 
2.3.5,  we  show  that  the  expected  response  of  any  standard  motion  analyzer  applied  directly  to  any 
microbalanced  random  sumulus  is  equal  to  the  expected  response  of  the  correspondmg  analyzer  tuned 
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to  motion  of  the  same  type,  but  in  the  opposite  direction. 

Microbalanced  random  stimuli  allow  us  to  differentially  stimulate  nonFouiicr  motion  mechan¬ 
isms  without  systematically  engaging  Fburier  mechanisms.  This  is  the  source  of  their  importance  in 
the  study  of  motion  peicqition. 

There  are  probably  several  types  of  nonFoorier  motion  mechanisms,  distinguished  by  the  dif¬ 
ferent  nonlinear  transformations  they  apply  to  the  signal  prior  to  standard  motion  analysis.  In  this 
paper,  we  extend  the  theory  of  microbalanced  random  stcnuh  in  order  to  develop  methods  for  con- 
sirucung  stimuli  that  selectively  engage  specific  classes  of  nonFourier  mechanisms  without  siimulat- 
mg  either  Fourier  mechanisms  or  other  classes  of  nonFourier  mechanisms. 

Poiniwise  transformaaons,  static  nanitnearities.  A  transfonnation  T  is  called  pointwise  if  the 
output  of  T  at  any  pomi  (x.y.l)  m  space-time  depends  only  on  the  (sumulus)  input  value  at  that 
point  A  nonlinear  pomtwise  transformation  sometimes  is  called  a  stanc  nonltnearity  For  instance, 
simple  rectifiers  and  thresholders  are  pomiwisc  transformations.  In  Section  3,  we  address  the  problem 
of  isolating  the  class  of  nonFouner  mechanisms  that  apply  a  simple  pomtwise  transformation  prior  to 
standard  motion  analysis  from  the  class  of  all  those  mechanisms  that  apply  more  complicated 
transformations.  The  central  result  in  this  Section  is  proposition  3.2  which  provides  necessary  and 
sufficient  conditions  for  a  random  stimulus  /  to  be  such  that  any  pomtwise  transformation  of  /  is 
microbalanced. 

Purely  temporal  transforvtations  and  texture  quilts.  The  results  with  pomtwise  transformations 
are  extended  in  Section  4  to  purely  temporal  transformations  (defined  in  Section  2.2).  Whereas,  for  a 
pomtwise  transformation,  the  transformed  value  at  the  point  (x.y.r)  depends  only  on  the  stimulus 
value  at  (x.y  ,1),  in  a  purely  temporal  transformation  the  transformed  value  at  (x.y.r)  may  depend  in 
any  way  whatsoever  on  the  entire  history  of  stimulus  values  at  (x.y).  We  define  the  class  of  stimuli 
called  texture  qutits  (Definition  4.1)  whose  importance  derives  from  the  fact  (proven  m  proposition 
4.3)  that  any  purely  temporal  transformation  of  a  texture  quilt  is  microbalanced.  Concrete  methods 
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are  provided  for  constructing  binary  and  sinusaiilal  texture  quilts  that  display  consistent  motion. 

In  Section  S,  these  construction  methods  are  applied  in  an  experiment  deagned  to  demonstrate 
the  effectiveness  of  three  textural  properdes  as  carriers  of  motion  information.  The  textural  properties 
are  (i)  spatial  frequency  variation  (ii)  orientation  variation,  and  (iii)  variation  between  perc^tualiy 
distinct  textures  with  identical  expected  energy  spectra. 

2.  Preliminaries. 

This  section  states  the  background  facts  presupposed  by  the  main  discussion  of  the  paper. 

2.1.  Discrete  dynamic  visual  stimuli. 

Notation.  Let  R  denote  the  real  numbers,  and  Z  (Z*)  the  integers  (posiuve  integers).  We  use 
square  brackets  to  enclose  arguments  of  discrete  functions,  and  parentheses  to  enclose  arguments  of 
continuous  functions. 

The  range  of  a  stimulus.  We  want  the  term  ’sumulus'  to  refer  not  only  to  the  luminance  func- 
uon  submitted  as  input  to  the  retina,  but  to  any  physiologically  reasonable  uansformation  of  the  spa- 
uotcmporal  luminance  funcuon  which  might  be  submitted  as  input  to  a  component  processor  of  the 
visual  system.  Consequently,  although  luminance  is  physically  a  non-negaiive  quantity,  we  do  not 
apply  this  constraint  to  the  ebss  of  functions  we  admit  as  sumuli.  We  allow  stimuli  to  take  values 
throughout  the  positive  and  negative  teal  numbers. 

The  domain  of  a  stimulus.  To  remain  close  to  our  Intuiuons  about  neurally  realized  visual 
processors,  we  take  stimuli  to  be  a  functions  of  the  discrete  domain  Z’  (where  the  dimensions 
correspond  to  horizontal  and  vertical  space,  and  time).  In  addition,  for  mathematical  convenience, 
and  without  loss  of  physiological  plausibility,  we  require  a  sumulus  to  be  0  almost  everywhere  m  its 
(infinite)  domain. 

The  definition  of  a  stimulus.  We  call  any  function  /:Z’-»R  a  sumulus  provided 
/[it,y,ll  =  0  for  all  but  finitely  many  points  of  Z^ 


August  7, 1990 


Chubb  &  Sperling:  Texture  Qiulis 


5 


We  Shull  be  considering  stimuli  as  functions  of  two  ^aiial  dimensions  x ,  >  and  time  r . 

Stimulus  contrast  As  is  now  well-established  (e.g.,  Shapley  &  Enrolh-Cugell,  1984),  early 
retinal  gain-control  mechanisms  pass  not  stimulus  luminance,  but  rather  a  agnal  approximating 
stimulus  contrast,  the  normalized  deviation  at  each  time  /  of  luminance  at  each  point  (x,y)  in  the 
visual  held  from  a  "background  level",  or  "level  of  adaptation",  which  reflects  the  average  luminance 
over  points  proximal  to  (x,y,r)  in  space  and  time.  Because  the  transformation  from  luminance  to 
contrast  is  a  processing  stage  that  is  general  to  all  of  viaon.  we  shall  drop  reference  to  mean  lumi¬ 
nance  Lf ,  and  characterize  L  only  by  its  contrast  modulation  function,  C : 

C  =  -^-l.  (1) 

What  we  shall  argue  m  this  paper  is  that  the  broad-band  spatial  Altering  that  mediates  the  step 
from  lummance  to  contrast  is  succeeded  by  additional  fllieiing  stages  in  which  a  number  of  narrowly 
tuned  spatial  Alters  are  apphed  to  the  visual  signal,  their  output  rectifled,  ard  the  resulting  spauotem- 
poral  signal  processed  for  motion  information. 

The  history  of  a  stimulus  at  a  point  in  space.  For  any  stimulus  / ,  any  point  (x ,  y)  e  we 

define the  hisioo’  of  I  or  (x,y), by  selling 

/(.,)(')  =  /(x.y.n  (2) 

for  all  t  £  Z. 

Space-time  separable  stimuli.  A  stimulus  /  is  called  space-time  separable  iff  /  can  be 
expressed  as  the  product  of  a  spatial  function /  ;Z^  ->  R  and  a  temporal  function  g  :Z  -» R;  For  all 
(x,y,r)£Z’,/(x.y,l)=/(x,y)«(r]. 

The  Fourier  transform  of  a  stimulus.  Because  any  stimulus  /  is  nonzero  at  only  a  finite 
number  of  points,  the  energy  in  I  is  finite,  implying  that  /  has  a  well-defined  Fourier  transform. 
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We&ooe/*sFDcrimriaifon3byf:aTO!g/6riiatnt7yini=bg(pLl). 

/(o.e.i:)=  £  £  £ O. 

Allhoajii/  isd:&Qsdfor2}Sre2lBsaS)ea<3,e,'i;itispenn(S£ovtr2ziaead>2:grxsL  Has 
fact  is  reSccis]  m  tb:  invoss  oznsfoca: 

2t2i2t 

/(x.y.H  =  J  J  /((3.e.i:)e'^*^‘=»<fe)<fe<ft  (4, 

(Zsy  0  0  0 

In  flic  FbHrier  domain,  we  coassantlynsg  to  to  into  freqaBaacsrcfeavetox.  6  fenpcnricsstfe^'e 
to  y ,  and  T  iretiiKncies  rdaiise  to  I . 

The  function  0.  We  write  0  for  any  hmction  that  astigis  0  to  each  dement  in  its 
Thus,  0  defined  on  is  the  sumulus  that  is  zero  throuejuict  s;tace  and  time.  We  aiso  write  0  for  the 
temporal  function  that  sets  Olt  ]  =  0  for  all  t  €  Z. 

2X  Mappings  and  stimulus  transformations. 

Letffbethesetofall  real-sal  ued  functions  of  Z\  and  call  any  function  of  Q  iiuo  fi  a  mapping. 
(We  shall  need  the  general  notion  of  a  mapping  only  briefly  in  order  to  specify  the  subset  of  well- 
behaved  mappings  called  transfonnalions.)  For  any  mapping  Af  and  any  /  €  ft,  Af(/>  b  a  real- 
valued  function  of  Z’:  accordingly,  we  write  Af(f)iz,y,t)  for  the  value  of  Af(f)  at  any  point 
(x.y.t)eZ’. 

If  It  is  continuous,  a  function  /:R  ->  R  submits  to  a  wide  range  of  useful  operations.  For 
instance,  if  /  is  continuous,  it  can  be  integrated  over  any  finite  interval.  Of  course,  /  need  not  be 
continuous  to  meet  this  condition.  For  instance,  /  b  integrable  over  any  finite  interval  if  /  b  discon¬ 
tinuous  at  only  a  finite  number  of  poinu  in  any  finite  interval.  If  /  is  integrable  over  any  finite  inter- 

converges,  |/g  also  converges. 

In  paiucular,  |/g  converges  if  g  is  a  density  funcuon.  For  the  results  repotted  here,  we  restria  our 

attenuon  to  a  special  class  of  mappings,  which  we  shall  call  stimulus  transformations,  that  have 


val,  and  if  /  also  is  bounded,  then  for  any  function  g  for  which  jg 
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pripr-rVc  j-i^-^.-r-'Ctaiiagrf&sc'gAeiaTgi  frrtrtmf.  VcsjecSjiestdssimsjcCgc&S 
a  tie  fcaj^lB 

CoBtcoses  Bsppi^s;  lEiegrable  mppaff;  boeadfd  nrpjSsjf.  Fat  tsy  /  c  Q, 
zsjrp  €  B,2sjyB  T^.vavie&I.^^  firths  drag  ofOtha  is  HfarisJ  to/  azSIocaaocso/Z* 
eza;xv>«{>s=&t2lgihsi!2h3sp.  Agycajpi^/f  isc2Ss5£SEaasssif//(/,^){Qisaax>- 
laaCTfiaaisoofp  fir  z^/ e  Q.sxlzq' V.(e  Z^.  //  isczSsd/ias^atejrcUcififirz:^- 
sodi/.y.23!li;.*fff,^MgBg>cti|?2tisfinrn3nofp  ovssaay  fiaiisferval.  Fa^.U  a 
<?71?d6eated5eiiif.f<g23ys3di/.v.23d^//ffy_yX0g»^abc«aaf<1fcrtiiriofp  ovcxtbsgof 
ralPCThsis. 

Tbs  dtfinitioo  of  a  stiaalas  traasfonnatioa.  A  nitaha  trax^omsdon  (siiidi  «ie 
ofisn  refer  to  ^ply  as  a  trcx^omsxotH  is  a  bocaded.  fiaitely  iai^iabls,  T  sash  that  7(S ) 

isastiaabtsforaaystiadasS.aadT'CO)  =  0. 

Tboe  ato  (Xbsr  reasoaable  coastraints  «v  night  inspose  <n  the  Doiioa  of  a  stimaJus  Dansfoenia- 
uon.  Forinsmise.ss'emighlreqcheastondascansfQnRaticatobelL'as-im-anaiuandcaasal.  How- 
aer,  »r  do  not  include  these  cooditiocis  in  oar  definition  because  they  are  not  requited  for  the  results 
wrqrort. 

Purely  tcmpoal  stimulus  trattsformatlons.  Let  f}^  be  the  set  of  all  fiinetions  mapping  Z  into 
R.  A  ttansformation  H  is  called  purely  teir^ral  iff  there  exists  a  function  Hj'Slj  -»  such  that 
for  any  stimulus/,  any  (z.y.t)e  Z’. 

HV)\x.y.t]  =  //r(/o„)If].  (5) 

That  ts,  the  value  at  the  point  (x,y.t)E  Z^  that  results  from  applying//  to/  depends  only  on  the  his- 
loiy  of  /  at  (xji).  Since  if  is  obvious  from  the  context,  w  drt^  the  distinction  between  H  and  Hj, 
and  allow  //  to  be  appliitd  both  to  fuil-iledged  stimuli  and  to  simple  functions  of  time.  Thus,  for  any 
temporal  function  P  :Z  -»  R,  we  shall  write  //(P )  to  indicate  the  temporal  funciioit  llj{P  ). 
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Wf  rm-rrrr.^  lypx  rt  tT»-tfiYT:-yv>nf  pio£c»icc  C23SfctS3- 

Ftnntnse traafonsatioss and  nctiSets.  'eorxsjfcx&asfzA  -*B  zsig-.B  -*C,ibt 
axrpassoBgtfiA  -»C  isgjvtnt^ 


g»f(a)  =  g(f(a))  (Q 

UxsryaeA.  Farany/Ji-»R.«-eca)!ltb£in2p^2/*,)-i£ldii]gtb:^>3!io:anpoc2lfmcii(n/*/ 
iitei  q)plisl  to  suacbs /.  apoiUHue  m^ipa^  (bccaos:  its  octpoi  laba  al  any  point  ia  spe£e-timc 
dqKads  (oly  oa  its  ispai  %ali!£  at  that  pcim). 

As  is  evidanl.  /•  is  a  cansfomiaiion  iff(i)/((9=0, 09/  >s  boondad  on  R.  and  (u9  /  is 
iniegrabla  ova  any  boinidsl  real  inIer^-^l.  A  oansftsinatioR/ •  is  called  a  positive  kalf-vave  rectiper 
if/  ismonoionicallyineieasins,and/(v]=OforalI  V  /vis called  ane^ain'e^oif-HUvereec/er 

if  /  is  moaotonieally  decreasing,  and  /  (v]  =  0  for  v  2  0.  Finally,/ *  is  called  a  full-wave  rectifier  if 
/  is  a  monoionieally  increasing  function  of  absolme  value. 

Linear,  sbifl-invariant  (LSI)  traasrormations.  For  any  offset  V  €  Z\  define  the  mapping  5* 

by 


S»(/)ia  =  /(C-V)  (7) 

for  any  /  E  fl.  Thus  $''{!)  is  derived  by  shifting  /  by  the  offset  v  in  2?.  Any  mapping  W  is  called 
sk^i-invariani  iff 

S»(Af(/))  =  Af(S»(/))  (8) 

for  any  VE  Z^.anyl  e  Cl.  Inaddilion.Af  is  hnear  iff  for  any/.A  e  IT.  any  real  numbers  k  and  X 

M(Kl  +  V)  =  KM(/)+XM(y).  (9) 

As  is  well  known,  any  linear,  shift-invariant  {LSI)  tiansfonnaiion  can  be  expressed  as  a  convolution, 
which  is  defined  for  any  u  E  Z’  by 

{k»l)lu]  =  £ilu-v)/lv).  (10) 

rfZ* 
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farsota£i2*-»R.  Tbsleixakat  sialtedthturyatorajpagig ofihstransfreniaiitmfa. 

23.  Random  stimon. 

FbranyrealiandomvariablsAr  wiihdaiaty/.wenTiioEprjfmihoe^omnont^X: 

£rXl  =  jxfMdx.  (11) 

Hie  notion  oi  a  random  sdmulus  genaralizos  that  of  a  (non-iandom)  stimulus  in  that  the  values 
assigned  points  in  space-time  by  a  random  stimulus  are  random  variables  (with  finite  variances)  rather 
than  constants. 

The  definition  of  a  random  stimulus.  Call  any  family  (Rfz.y.r)  |  (z.y,  t)  E  Z^)  of  jointly 
distributed  random  variables  a  random  slimulus  provided 

(OX  [z.y .  r]  is  constant  and  equal  to  0  for  all  but  finitely  many  (x.y.  r)  e 
and 

(iO£[RIz.y.«)j  existsforall(i.y.r)E  Z’. 

As  with  non-random  stimuli,  we  write  R  for  the  Fourier  transfomi  of  any  random  stimulus  X ;  and,  for 
aoy  X  =  (*•  y)  s  temporal  random  function  defined  by 

«iIi)  =  R[X.O  (12) 

for  all  times  i  e  Z, 

Space-time  separable  random  stimuli.  We  call  a  random  stimulus  X  space-time  separable  iff 
X  is  space-time  separable  with  probability  1, 

Constant  stimuli.  Any  ordinary  stimulus  can  be  regarded  as  a  random  slimulus  that  does  not 
vary  across  independent  realizations.  We  call  such  such  unvarying  stimuli  constant. 

The  molion-from-Fourier-components  principle.  Parseval's  relation  states  that  the  energy 
m  a  sumulus  is  proportional  to  the  energy  in  its  Founer  transform.  Individual  spatiolemporal  Founcr 
components  are  drifung  sinusoidal  graungs.  Thus,  we  can  add  up  the  energy  in  a  dynamic  visual 
sumulus  either  pomt-by-pomt  in  space-ume,  or  dnfung  sinusoid  by  drifting  sinusoid.  A  commonly 
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eacoosiered  rule  of  thomb  (Watson.  Abimiada  &  Fanell,  19S6;  Watson  £  Ahumada.  1983b;  van  San- 
ten  ^Spelling,  198S)  for  predicting  the  apparent  motion  of  an  adiiDaty  stimaliis/[x,y ,(]  =/[x.(] 
(constant  in  the  vertical  dimension  of  space),  is  the  moiion-frtm-Fouitr-componenis  piinciple:  For  I 
regarded  as  a  linear  combination  of  drifting  snusoidal  gratings,  if  most  of  / ’s  energy  is  contributed  by 
rightward-drifting  gratings,  then  perceised  motion  should  be  to  the  right  If  most  of  the  energy 
rerides  in  the  leftward-drifting  gratings,  perceived  motion  should  be  to  the  left  Otherwise/  should 
manifest  no  decisive  motion  in  either  direction. 

Drift-balanced  random  sfimuU.  The  class  of  drjfi-batanced  random  stimuli  (Chubb  &  Sper- 
Lng,  1987.  1988)  provides  a  rich  pool  of  counterexamples  to  the  motion-from-Fourier-components 
principle.  A  random  stimulus  R  is  d.'i.ft  balanced  iff  the  expected  energy  in  R  of  each  drifting 
sinusoidal  component  is  equal  to  the  expected  energy  of  the  component  of  the  same  spatial  frequency, 
dnfung  at  the  same  rate,  but  in  the  opposite  direction.  The  term  drift  balanced  is  deftned  formally  as 
follows. 

Definition  of  a  drift-balanced  random  stimulus.  Call  any  random  stimulus  R  drift  balanced 
iff 

£[|R((0.e,-c)l^]  =  £[|R((O.0,-T)f]  (13) 

for  all  (to.  0.  t)  e 

Thus,  for  any  class  of  spatiotemporal  hncar  receptors  tuned  to  stimulus  encigy  in  a  certain  spa- 
notemporal  frequency  band,  a  drift-balanced  random  sumulus  will,  on  the  average,  stimulate  equally 
well  those  receptors  tuned  to  the  corresponding  band  of  opposite  temporal  orientation. 

Microbalanced  random  stimuli.  Consider  the  following  two-flash  stimulus  S:  In  flash  1.  a 
bright  spot  (call  it  Spot  1)  appears.  In  flash  2.  Spot  1  disappears,  and  two  new  spots  appear,  one  to  the 
left  and  one  symmetrically  to  the  nght  of  Spot  1.  As  one  might  suppose.  S  is  drift  balanced.  On  the 

*.  For  I  proof  ihn  Lie  upccuU  energy  of  the  Foiner  innlfoim  of  my  rendom  lumulut  il  everywhere  wetlKletined  lee 
Chubb  St  Sperling.  1983.  tppendut  A 
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oflier  hand,  it  is  equally  clear  that  a  Fourier  motion  detector  whose  spatial  reach  encompasses  the 
location  of  Spot  1  and  only  one  of  the  Spots  in  2  may  well  be  stimulated  in  a  fixed  direction  by 
S .  Thus,  although  S  is  drift  balanced,  some  Fourier  motion  detectors  may  be  stimulated  strongly  and 
systematically  by  S .  These  detectors  can  be  differetitially  selected  by  spatial  windowing,  and  thereby 
the  drift-balanced  stimulus  S  is  converted  into  a  non-drift-balanced  stimulus  by  multiplying  it  by  an 
appropriate  space-time  separable  function.  The  fallowing  subclass  of  drift-balanced  random  stimuli 
cannot  be  made  non-drift-balanced  by  space-time  separable  windowing. 

Definition  of  a  microbalanced  random  stimulus.  Call  any  random  stimulus/  microbalanced 
iff  the  product  IW  is  drift  balanced  for  any  space-time  separable  function  W . 

One  can  thinh  of  the  multiplying  function  W  as  a  "window"  through  which  a  spatiotemporal 
subregion  of  /  can  be  "viewed"  in  isolation.  The  space-time  separability  of  W  insures  that  W'  is 
"transparent"  with  respect  to  the  motion-content  of  the  region  to  wlueh  it  is  applied:  W  does  not  dis¬ 
tort/'s  mouon  with  any  motion  content  of  its  own.  The  fact  that  /  is  microbalanced  means  that  any 
subregion  of  /  encountered  through  a  "motion-transparent  window"  is  drift  balanced. 

The  following  charactenzabon  of  the  class  of  microbalanced  random  stimuli,  and  all  other 
results  staled  without  proof  in  this  section  arc  from  Chubb  and  Sperling  (1988). 

2  J.l.  d  random  slimuius  I  is  microbalanced  if  and  only  if 

£[/(x.y.r)/(x'.y'.i')-/[x.y,r')/(x'./.r)]  =0  (14) 

forallx,y,i,!d,y,i'  e  Z. 

Some  other  relevant  facts  about  microbalanced  random  stimuh: 


23.2.  For  any  independent  nucrobalanced  random  stimuli  I  and]. 


and 


I.  the  product  IJ  is  microbalanced. 


11.  the  convolution  I  *J  is  microbalanced. 
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Z32,  (a)  Any  space-time  separable  raniam  stimulus  is  microbalanced;  (b)  any  constant  nticrobal- 
anced  stimulus  is  space-time  separable. 

The  following  result  is  useful  in  constructing  a  wide  range  of  microbalanc^  random  stimuli 
which  display  saildng  apparent  motion. 

2  J.4.  Let  r  be  a  family  of  pairwise  independent,  microbalanced  random  stimuli,  all  but  at  most  one 
of  which  have  expectation  0.  Then  any  linear  combination  of  Vis  microbalanced. 

Reichardt  detectors  and  microbalanced  random  stimuli.  Two  Foiuier  motion  detectors  pro¬ 
posed  for  psychophysical  data  (Adelson  &  Bergen,  1985:  Watson  &  Ahumada,  1983a,  1983b)  can  be 
recast  as  Reichardt  detectors  (Adelson  &  Bergen,  1985:  van  Santen  &  Sperlmg,  1985).  The 
Reichardt  detector  has  many  useful  properues  as  a  mouon  detector  without  regard  to  its  specific 
instantiation  (van  Santen  &  Sperling,  1984, 1985). 

FIG  I 

Figure  1  shows  a  diagram  of  the  Reichardt  detector.  It  consists  of  spatial  receptors  character¬ 
ized  by  spatial  funcuons/i  and  /2.  temporal  filters  gi*  and  g2*>  muluphers,  a  differencer,  and 
another  temporal  filter  ht .  The  spatial  receptors  /,,  i  ==  1, 2,  act  on  the  input  sumulus  7  to  produce 
miermediate  outputs, 

ytU]  '=  E  (15) 

At  the  next  stage,  each  temporal  filter  transforms  itsmputy,  (i.y  =  1,2),  yielding  four  temporal 
output  functions:  g;  »  y, .  The  left  and  tight  multipliers  then  compute  the  products 

[yi*8i[')][y2*g2[<)]  and  [yi*g2(/)][y2*«iIo]  respectively,  (16) 
and  the  differencer  subtracts  the  output  from  the  right  multipher  from  that  of  the  left  multipher: 

DU)  =  [yi  *  gift)]  [>2*  gzio]  -  [yi*  gzio]  [>2*  8tl<l]  ■  (17) 
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Fig.  I.  The  Reichardt  detector.  Let/  be  a  random  stimulus.  Hien.  in  response  to/,  for  i  =  1,2,  the  box 
containing  the  spatial  function  /,  :Z^  ->  R,  outputs  the  temporal  function,  /,  (x ,  y )  /  [x ,  > ,  t  ];  each  of 

the  boxes  marked  g,  •  outputs  the  convolution  of  its  input  with  the  temporal  function  g,  ;Z  ^  R;  each  of 
the  boxes  marked  with  a  multiplicauon  sign  outputs  the  product  of  its  inputs,  the  box  marked  with  a  mmus 
sign  outputs  Its  left  input  minus  its  nght,  and  the  box  containmg  h>  outputs  the  convolution  of  its  input 
with  the  temporal  funcuon  /i;Z->R.  To  see  how  the  Reichardt  detector  senses  motion,  suppose  /j  is 
idenucal  to/,,  but  shifted  in  space  by  some  offset,  and  suppose  the  filters  g,*  do  not  alter  their  input, 
while  the  filters  g2*  simply  delay  their  input  by  some  amount  5,  of  lime.  Then  a  rigidly  translating  pattern 
moving  in  the  direcuon  of  box  /2’s  offset  from  box  /,  will  elicit  some  time- varying  response  from  box  /, . 
and  the  same  response  a  short  time  later  from  box  /  j.  If  that  "short  time  later"  is  precisely  8,,  the  output  of 
the  nghihand  muluplier  will  be  posiuve  as  long  as  the  pattern  keeps  drifting.  This  will  result  in  a  net  nega 
Uve  Reichardt  detector  output  If  the  pattern  drift  is  in  the  opposite  direction,  the  detector  response  will  be 
positive. 
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The  final  output  is  produced  by  applying  the  filter  h* .  whose  purpose  is  to  smooth  the  time-varying, 
diffeiencer  ouqrut  D .  Since  many  Fourier  mechanisms  can  be  expressed  as,  or  closely  ^proximated 
by,  Reichardt  detectors  (vm  Santen  &  Sperling,  1985;  Adelson  &  Bergen,  1985, 198Q,  the  following 
characterization  of  the  class  of  microbalanced  stimuli  can  be  regarded  as  the  cornerstone  of  the  claim 
that  microbalanced  random  stimuli  bypass  Fourier  motion  mechanisms. 

2  J.5.  For  any  random  airmdus  I ,  thefoUovnng  conditions  are  equivalent: 

1. 1  is  microbalanced. 

2.  the  expected  response  of  every  Reichardt  detector  to  I  is  0  at  every  instant  in  time 

Proof.  Chubb  &  Sperhng  (1988)  proved  that  1  implies  n.  To  obtain  the  reverse  implication,  note  that 
if  n  holds,  then,  in  parucular,  for  any  points  (x,y),(x',/)e  and  any  S,  e  Z,  the  expected 
response  to  /  is  the  temporal  funcuon  0  for  a  particular  simple  Reichardt  detector  that  computes 

/(r:.y.()/lx',y',i-8,)-/(x,y,t-8,)/Ix',/,/).  (18) 

This  Reichardt  detector  is  constructed  by  making  (i)  / 1  (of  Fig.  1)  the  function  that  takes  the  value  1 
at  (x.y)  and  0  everywhere  else,  (ii)/ j  the  funcuon  that  takes  the  value  1  at  (x',  y')  and  0  everywhere 
else,  (ill)  each  of  and  h*  the  idenuty  transformauon,  and  (iv)  gj*  the  filler  that  delays  its  input 
by  8,  units  of  time.  However,  if  the  expected  response  to  f  is  0  throughout  time  for  any  such 
Reichardt  detector,  then  Eq.  (14)  holds,  and  proposition  2.3.1  implies  that  /  is  microbalanced.  I 

3.  Random  stimuli  microbalanced  under  all  pointwise  transformations. 

The  mam  purpose  of  this  paper  is  to  provide  tools  for  differentially  stimulating  specific  types  of 
noiiFouner  motion  mechanisms  without  engagmg  either  Fburier  mechanisms  or  other  types  of  non- 
Fourier  mechanisms.  A  nonFouner  motion  mechanism  is  one  that  applies  an  imtial  nonlinear 
transformation  to  the  visual  signal  and  subjects  the  output  to  standard  motion  analysis.  In  this  section, 
we  provide  some  results  relevant  to  the  psychophysical  problem  of  stimulating  nonFouner  mechan- 
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isms  whose  initial  transformadon  is  nonpointwise  without  engaging  any  mechanism  whose  initial 
oansfOTituion  is  pointwise.  The  mmn  finding  is  stated  in  proposition  32,  which  provides  necessary 
and  sufficient  conations  for  a  random  stimulus  /  to  be  such  that/ •/  is  microbalanced  for  any  point- 
wise  transformation/*.  In  Section  4  we  shall  apply  this  result  to  construct  random  stimuli  (texture 
quilts)  which  are  microbalanced,  and  are,  moreover,  guaranteed  to  rentain  microbalanced  after  any 
purely  temporal  transformation.  Such  stimuli  are  useful  for  selectively  stimulating  nonFouner  motion 
mechanisms  thafextract  motion  information  from  stimuli  that  have  undergone  nonlmear  spaiial 
stimulus  transformations. 

We  begin  by  considering  an  example  of  a  stimulus  (Chubb  &  Sperling,  1987,  1988)  that  is 
mierobalaneed  under  all  pointwise  transformauons,  but  whose  motion  can  be  revealed  by  a  purely 
temporal  nonlinear  transfotmauon. 

3.1.  Stimulus  J:  Traveling  reversal  of  a  random  black-or-nhite  vertical  bar  pattern.  Let 

M  6  Z*.  We  construct  the  random  stimulus  7  of  M+l  frames  indexed  0, 1 . Af ,  each  of  which  con- 

lams  M  verucal  bars,  indexed  l,2...,Af  from  left  to  right.  In  frame  0  of  stimulus  7 .  all  Af  vertical 
bars  first  appear.  The  contrast  of  each  bar  is  1  or  -1  with  equal  probabihty,  and  bar  contrasts  are 
jointly  independent.  In  each  successive  frame  m,  m  =  l,2,...,Af,  the  m'*  rectangle  flips  its  contrast 
to  I  if  its  previous  contrast  was -I;  otherwise  it  flips  from  1  lo-l.  In  frame  1,  rectangle  1  flips  con¬ 
trast;  in  frame  2,  rectangle  2  flips,  and  in  successive  frames,  successive  rectangles  flip  contrast  from 
left  to  right,  until  the  Mih  rectangle  flips  in  frame  Af ,  after  which  all  the  rectangles  turn  off.  An  xi 
cross-section  of  frames  0  to  Af  of  7  is  shown  in  Fig,  2a. 

no  2 

The  traveling  contrast-reversal,  stimulus  7 ,  is  easily  expressed  as  a  sum  of  pairwise  indepen¬ 
dent,  space-time  separable  random  stimuli,  all  with  expectation  0;  thus  propositions  2.3.3a  and  2.3.4 
unply  that  7  is  microbalanced.  Moreover,  it  is  easy  to  see  that,  because  7 's  frames  are  comprised  of 
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Fig,  2.  Exposing  the  motion  of  the  traveling  contrast-reversal  of  the  random  black-or-white  vertical  bar 
pattern  J  to  standard  motion-analysis,  (a)  An  a  cross-section  ofi.  (b)  An  xi  cross-section  of  the  partial 
derivative  of  y  with  respect  to  time.  (c)Anx(  cross-section  of  |3l//dl|.  Eachof2  andd7/di  ismicrobal- 
anccd.  However,  |dy/di|  is  not.  In  particular,  |d//di|  has  most  of  ns  energy  at  those  frequencies  whose 
velocity  is  equal  to  the  velocity  of  tlic  traveling  contrast-reversal. 
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only  two  values,  any  pointwise  transfoimation  of }  merely  serves  to  rescale  each  of  7 ’s  frames,  and 
to  shift  it  by  a  constant  that  is,  for  any /:R-»R,/«7=X/  +  AT,  where  X  6  R,  and  if  is  a  stimulus 
that  assigns  a  constant  value  across  all  points  at  which7  is  nonzero.  Clearly,/ *7  is  another  micro- 
balanced  random  function  CThis  follows  easily  from  proposition  2.3.4).  Thus,  pointwise  transfoima- 
tions  fail  to  expose 7 's  motion. 

Exposing  J’s  motion  to  standard  analysis.  Perhaps  the  simplest  way  to  extract  7 's  motion  is 
to  full-wave  rectify  the  partial  derivative  of  7  taken  with  respect  to  time.  The  stages  of  this  transfor¬ 
mation  are  illustrated  in  Figs.  2b  and  2c.  Fig.  2b  shows  37/di.  This  function  is  itself  microbalanced 
(propositions  2.3.2  11.  and  2.3.3a  imply  that  any  purely  temporal  LSI  transfotmation  of  a  microbal- 
anccd  random  sumulus  is  microbalanced).  However,  |97/df  |  (Fig.  2c)  has  most  of  its  energy  at  those 
spauotemporal  frequencies  whose  velocity  is  equal  to  the  velocity  of  the  traveling  contrast-reversal 
whose  mouon  we  wish  to  delect.  Thus  we  sec  that,  although  7’s  motion  cannot  be  exposed  to  stan¬ 
dard  analysis  by  a  sunple  pomtwise  transfomtaiion,  a  temporal  linear  filter  followed  by  a  pointwise 
nonlinearity  does  suffice. 

We  turn  now  to  the  problem  of  supulaling  the  general  condiuons  that  a  random  stimulus  /  must 
satisfy  so  that  / •/  will  be  microbalanced  for  any  pointwise  transformauon  /•.  Call  any  random 
sumulus/  microbalanced  under  a  uansfoimauon  T  iffr(/)is  microbalanced. 

We  state  the  following  basic  proposition  (3.2)  and  its  subsequent  corollary  (3.3)  for  conunu- 
ously  distnbuted  random  sumuU.  The  corresponding  result  for  discretely  distnbuted  random  stimuli 
is  simpler  and  should  be  evident. 

3.2.  Necessary  and  sufficient  conditions  for  a  random  stimulus  to  be  microbalanced  under  all 
pointwise  transformations.  Let!  be  a  random  sumulus  such  that  for  any 
(Ilx,y.t],llx',y',n)hasa  conunuous  joint  density.  Then  the  following  condtuons  are  equivalent  ■ 

1  I  is  microbalanced  under  all  potntwtse  tran^ormouons 
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2.  For  all  x,y,r,x',y',r'  €  Z.  the  joint  density  f  of  (/[x,y.r],/[x',/,r'))  and  the  joint 
density g  qf(J[x,y,/'\J[:d,y,tX)sa6^ 

/(P. «)+/(«. P)  =  g(p.9)+S(?.P)  (19) 

foranyp.q  e  ^  such  that  p  and  q 


Proof.  Set  K=/(x,)>.t],  X=/(x',y',r'),  7=/(jr,y,/'J,  and  v  =  /(x',y',r).  Thus,  (ic,X)  is  distri¬ 
buted  in  with  density  /  and  (y.  v)  is  distnbuted  with  density  g . 

(2.  implies  1.):  By  definition  of  any  pomtwise  transformation  we  have  A(0)  =  0.  Thus  we  need 
miegrate  only  over  values  of  k  and  X  which  are  both  nonzero  in  computing  the  expectation 
E  [h  (k)  h  (X)l  In  particular,  if  Eq.  (19)  is  satisfied  for  all  p  ^  0  and  q  rt  0,  then  A  *  /  is  microbalanced 
since 


£(A(k-)A(X)]  =  Y  ^h(p}h(q)/(p,q)dpdq  +  h(q)  h<p)/(q,p)dq  dp 
=  J  ^|A(p)h(9)/(P.9)<<ui(9  +  ^h(p)h(q)f(q.p)dpdq 
=  •T||h(p)A(9)(^(p.9)+/(?.p))dp<f? 

■^RR 

=  jj^j^f‘(p)h(q')(g(p,q)  +  g(q,p))dpdq  =  £  lA  (y)  A  (v)). 

(Note:  the  boundedness  &  finite  integrability  of  A»  ensure  that  these  expectations  exisL) 


(Not  2.  implies  not  1.):  On  the  other  hand,  suppose  Eq.  (19)  fails  for  some 
x,y.t,x',y',('  €  Z.  One  way  in  which  this  might  happen  is  if/(r,r)>  g(r,r)  for  some  nonzero 
r  e  R.  In  this  case,  there  exists  a  neighborhood  N  of  r,  not  including  0.  such  that 
/()n,n)>g(m,n)foraUm,n  e  N.  Thus,  for  the  funedon  A  ;R ->  R  defined  by 


fl  iffl  £N. 
hCn)  -  otherwise, 

A«  IS  a  pomtwise  transformauon  (the  funcuon  A  is  bounded  on  R,  finitely  intcgrable,  and  A  (0)  =  0) 
However,  h»I  is  not  microbalanced  since 
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£(A(k)A(X)1  =  jj^f(m,n)dmdn  >  jjg(m,n)dmdn  =  Elh(.i)  h(y)].  (22) 

To  recapitubte,  if  Cbndition  2  fails  because  there  exists  a  nonzero  r  e  Rforwhich/(r,r)^g(r,r), 
then  Condition  1  fails  (f  is  not  microbalanced  under  all  pointwise  transformations). 

The  only  other  way  in  which  Condition  2  can  fail  isif/(r,r)  =  g(r,  r)  for  all  r  ^  0  in  R,  but 
for  some p. 9  €  R.  with  neither p  nor  ^  equal  toO,/(p.?)+/(^,p)>  g(p,9)  +  g(?,p).  In  this 
case,  we  obtain  disjoint  neighborhoods  M  of  p  mdN  of  q,  neither  including  0,  such  that 


/(m. «)+/(«, m)>g(m,n)  +  g(n,m)  (23) 

for  all  m  £  Af ,  n  e  W :  consequently. 


|f/(m, «)+/(«, m)dm<fn  >  g(m,n)  +  g{n,m)dmdn.  (24) 

Moreover,  since — by  assumpuon — /(p,p)  =  g(p,p)  and  /(q,?)  =  g(q,q),  we  can  tailor  the 
neighborhoods  Af  and  W  to  make  the  difference 


4||/(n,n')<fn  dn' 


^jf  g(m,m')dm  dm'  +||g(n,  «')</«  dn'\ 


,(25) 


as  small  as  we  want.  Consider,  then,  the  function  A  :R  -*  R  defined  by 


1 1  ifu  £  Af  uW, 

°|o  oiAewiJe. 

Again,  A  •  is  a  pointwise  transformauon.  However,  A«/  fails  again  to  be  microbalanced  because,  for 
suitably  tailored  Af  andAf, 


£(A(k)A(A)]  =  j|j[/(u,  v)</u  <fv  +||/(«,v)(/u  <fv  +  j[|/(«,  v)+/(v,r()(f«  dv 
>  S(u,v)du  dv  +  ll^g(j,v)du  dv  ■+ j^l^g(u,v)  +  g(v,u)du  dv  =  £(A(y)  A(v)).  I* 


33.  Corollary.  Let  /  be  a  random  stimulus  such  that  for  all  (x,y,i),(x’,y',t')£  Z’,  the  pair 
Vlx.y.l]Jlx'  ,y'  ,f])  has  a  continuous  joint  density.  Then  I  is  microbatanced  under  all  pointwise 
transformations  if  the  following  condtuon  holds  for  all  x,y,t,x',y',f  £  Z.  For  f  the  joint  density 
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/(P.9)  =  8(p.9) 

hxsSip.qe  R.P  *0.9  *0. 

(28) 

/(P.9>  =  g(9.p) 

fir  an  p.9  e  R,p  *0.9  *0. 

(29) 

Proof.  If  Eq.  (28)  holds  foe  soci£Ci.y.«).(x'.j'.j')e  Z.’.CjsJweslsofcn': 

fi<l.P)  =  Zfa.p'i  fer  all  p.^€R,p  *0,9*0.  (30) 

and  v,:  obtain  Hq.  (19)  by  addins  Eq.  (2S)  and  Eq.  (30).  Tb:  same  itasocins  za&t  foe  Eq. 

(29).  I 

A  landORi  stimulus  mioobalanoed  anda  aB  pointaue  oansfoematioES.  ba  qni:  diOcim  6oca 
J  of  example  3.1  is  the  followins.  sossesied  by  J.  Lai^  (1989). 

3.4.  Stimulus  K :  Rotating  random  dot  cylinder.  Cbnsmict  X  by  taJang  the  parallel  projKtion  of  a 
set  of  pomls  on  (an/or  inside)  the  surface  of  a  cylinder  rotating  artrend  a  s-estical  axis.  Let  the  consast 
values  of  the  points  be  independent,  identically  distributed  ra,ndom  variables.  As  is  rxeD  b!o*T.  «hen 
properly  constructed,  K  can  display  a  very  strong  kinetic  depth  effect,  with  dots  moving  in  one  direc- 
Don  seen  as  being  in  the  front  of  the  axis  of  rotation,  and  dots  moving  in  the  other  direction  seen  as 
being  in  the  back  (Ullman,  1979;  Dosher,  Landy,  &  Sperling.  1989).  Konetheless,  K  is  microbal- 
anced  under  all  pointwise  trar.sformauons:  All  of  K’s  systematic  motion  is  horizontal;  thus,  we  can 
drop  reference  toy.  and  note  that  for  any  x.r.x'.r',  the  joint  distribution  of  (7f(i.r)./f(x',r’))  is 
idenucal  to  thatof  (ff(x,r'l,lflx',i)).  Hence,  by  Corollary  3.3,  Condition  (3),  AT  is  mitnobalanced 
under  all  pointwise  transformations. 

4.  Texture  quilts. 

the  rest  of  this  paper  is  devoted  to  iilustraurg  how  the  results  of  Section  3  can  be  applied  to 
construct  sumuli  which  display  consistent  appa.':nt  mouon  that  cannot  be  exposed  to  standard 
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^=;!3^ssby^q^pgdy  tf.=j«23e2aaig=i3ga.  fc3a3:.-Z£  gi'sd  eaaao- 

s£=S.  cSsS  cs=7e  OJcSssai  4.1).  ilbzt  2:s  casoc^EcsJ  ^  pc:^  tsa- 

pod  C3=Sibc=2S3^ 


FjG3 

As  iHssisisl  in  Hj.  3,  tb:  cstsSosdocs  tld  sd&e  to  expose  ib£  modoQ  cf  isoise 

qglistosaairgzialymcniOlreapgely^tiaHinBgfitoj*  ft^kmdbyaizd&rr*: 

r<2)  =  r.Ci.e)-  (31) 

Tbs^ssalfiJurs*  tnllre^xxidtnibsatjiagcnagyihnxsg&octiEgjsisofibsviscalfifld.dsprai]' 
tag  oa  whether  or  nai  tbs  l£xims  to  nbidi  it  is  tsnolpopab::  those  icpocs.  Kowera'.theoc^of 
alixarfiliatoatextureispositheorDegemrdqieadhigoatheladphaseorthetantze.  Thepsr- 
poseofrcctihcahoaistotiansfcanr^oosoriBgb-s'zrienres*  re^xaseiatarepoosof  bigbas'crzge 
doe,  tbss  inssiiig  tba:  the  itciified  oc^  repsiers  the  {sesence  or  absence  of  textsre.  independmt 

of  phase.  The  tesult  TfQ)  is  a  spaUomipod  funcoon  whose  salts  reflects  the  ksd  lexntre  prefer- 

2 

ences  of  s«  in  the  visual  field  as  a  function  of  tiine  (Bergen  &  Adelson,  1988;  Caelli,  198S). 

The  essential  trick  in  all  the  quilt  examples  we  consder  is  to  padi  together  various  brief 
displaj's  of  siauc,  random  texture,  taking  appmpriaie  measures  to  ensure  that  the  resultant  stimulus 
satisfies  the  following  definition. 

4.1.  Definition  of  a  texture  quilt.  Let  A  c  be  a  set  of  points  in  space,  and  let  r^  t, . tj^  be  a 

strictly  mcrcasing  sequence  of  limes,  with  T=  (t  |  toSt  <r*-).  Call  any  random  stimulus  Q  satis¬ 
fying  the  following  conditions  a  fexiure  qmlt: 

(i)  Q  assigns  0  to  all  points  outside  A  x  T. 

.In  tcnenl.  a  fpaiul  Lneir  filler  follo««d  ty  s  pomuise  nonlaunty  aa  have  titionrily  hi^h  order  Vohm  kernels, 
dependme  cx)  the  order  d  the  Taylor  senea  d  the  potfXttise  tnnsfonnaooa  However,  if  we  take  the  rectifier  of  step  (2)  to 
hcRect{x)  =  j:^,thenihis  squared  ou^i  of  a  ^Ual  filter  IS  a  second  order  ipaual  tnnsformaocn  Standard  mooon 
analysis  is  yet  another  second  order  transformaiion  Thus,  when  we  subjecs  the  squared  fiber  output  to  standard  mooon 
analysts,  we  am  applying  a  fourth  order  operator 
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Fig.  3.  Fbuner  and  nonFourier  motion  mochanisms.  (a)  Fourier  motion  mechanisms  apply  standard 
motion-analysis  directly  to  the  luminance  signal  L.  (b,  c.  d)  NonFourier  mechanisms  apply  standard 
mouon  analysis  to  a  nonlinear  transformation  of  luminance,  (b)  A  simple  nonFourier  mechanism  applies  a 
signal  transfoimauon  comprised  of  a  spatiotempoial  luiear  filler,  followed  by  a  pointwise  nonlinearity.  The 
* 's  mdicaie  spatial  and  temporal  convolution,  respectively,  and  •  indicaus  multiplication.  The  filtering 
performed  in  (b)  is  roughly  pouimisc  in  lime  (the  temporal  impulse  response  b2  approximates  an  impulse), 
and  the  nonluieariiy  apphed  is  a  full-wave  rectifier.  This  system  (with  appropriately  chosen  spatial  filler, 
bl)  will  extract  the  motion  of  the  texture  quilts  shown  in  ITgs.  4b,  5d,  6c,  and  6d.  It  will  not  extract  the 
motion  of  stimulus  y,  the  traveling  contrast-reversal  of  the  random  vertical  bar  pauem  shown  in  Fig.  2a. 
(c)  A  spatially  pomiwise  (the  spatial  impulse  response  cl  approximates  an  impulse),  system  with  a  fliclcer- 
sensiiive  temporal  filter  and  a  full-wave  rectifier.  Because  of  the  flicker  sensitivity,  this  mechanism  will 
extract  the  motion  of  the  iravchng  contrast-reversal  of  the  random  vertical  bar  pauem  shown  in  Fig.  2a  but 
not  the  motion  of  the  texture  quills  shown  in  Figs.  4b,  5d,  6c,  and  fid.  (d)  The  temporal  filter  d2  averages 
the  temporal  fillers  b2  and  c2,  and  the  pointwise  nonlinearity  is  a  full-wave  rectifier.  With  an  appropriate 
spaual  filler  dl,  ths  nonFouner  system  extracts  the  mouon  of  any  corresponding  texture  quill  as  well  as  the 
mouon  of  the  traveling  conuast-reveisal  of  the  random  vertical  bar  pauem  shown  in  Fig.  2a.  However,  it 
would  be  less  wclt-suiied  to  these  tasks  than  the  detectors  shown  in  (b)  and  (c)  whose  temporal  filters  it 
averages. 
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(iv)  Symneay.  For  any  a,peA.  2nd  any  /  €  T.  the  joint  distribution  of 
(Q  fo- 1]>  Q  [P<  <])  i^  identical  to  the  jt^t  distribution  of  (Q  [p,  iJ.Qia.  r]). 

Terminology;  Call  A  and  T  respectisely  g's  spuria/  and  teirporal  regions  of  aedvity,  and  for 
I  =  0, l._,A/-l, call  {t  I  r, -ti r  < r,%,i )  the  1*  timebiock of fi. 

Ibe  empirical  usefulness  of  texture  quilts  derives  from  proposition  4.3  in  conjunction  with  the 
fxt  that  it  is  easy  to  construct  various  sorts  of  texture  quilts  which  display  consistent  apparent  motion 
across  independent  realizations.  The  proof  of  proposition  43  is  eased  by  the  following 


43.  Lemma.  let  Q  be  a  texture  quill  with  spatial  region  of  activity  A .  Then  for  any  a.  A,  the 
pair  of  temporal functions  (Co-  2p)  “  disiribuied  identically  to  the  reverse  pair  (Cf.  Co)- 


Proof.  From  Definition  4.1  (i)  and  (ii),  note  that  for  temporal  functions  P  and  //,  the  density  of  the 
joint  assignment  (Co-  Cp)  =  (P,/?)  is  0  unless  each  of  P  and  /f  is  constant  throughout  each  nme- 
blocl:,  and  0  outside  T.  Thus,  any  P  and  P  for  which  the  joint  assignment  (Co-  Cp)  =  (P-P)  has 
nonzero  density  are  completely  determined  by  the  values  Pl/,)=p,,  and  PIr,)  =  r,,  for 
I  =  0, 1,-..W-1;  For/i  the  joint  density  of  (Co('.  J-  CpU,]).  Definition  4.1  (iii)  thus  implies  that  the 
density  of  the  joint  assignment  (Co- Cp)  =  (P,P)ts 

n/.(p,,r,).  (33) 

j>0 
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But  by  DeSniuon  4.1  (iv),  the  quuutiiy  (33)  is  equal  to 

n/i(r,-.p,).  (34) 

laO 

whichisthedenaiyoflhercvcrseoccuiTencethat(2j,2o)  =  (P.Jf).  I 

43.  Texture  quilts  are  microbalanctd  under  purely  temporal  transformations. 

/.  Any  texture  quilt  uith  a  continuous  joint  density  is  imcrobalanced  under  ail  purely  tem¬ 
poral,  continuous  tran^ormations. 

II.  Any  discretely  distributed  texture  quilt  is  microbalanced  under  all  purely  temporal 
tranrformations. 

Proof  of  I.  Let  j2  be  a  texture  quilt  with  a  continuous  joint  densi^.  and  let  <1>  be  an  arbitrary  purely 
temporal,  continuous  transfonnation.  We  must  prove  that  d>(Q)  is  microbalanced.  We  can,  of 
course,  accomplish  this  by  proving  that  d>(Q)  is  microbalanced  under  all  pointwise  transformations 
(since,  in  particular,  the  identity  transformation  is  pointwise).  This  turns  out  to  be  a  convenient 
approach. 

Let  a,  p  be  points  in  space,  and  let  r  and  u  be  points  in  time.  Because  <I>  is  bounded  and  con¬ 
tinuous  and  Q  has  a  continuous  joint  density,  we  know  that  the  joint  density  /  of 
(<1>(2  )Ici.  r  1. 0(Q  )[p.  u  1)  and  the  joint  density  g  of  (<b(g  )lp.  r ),  <S>{Q  )la, « ))  boih  exist  and  are  con¬ 
tinuous  on  We  shall  show  for  any  (p.r)e  R’  with  neither  p  nor  r  equal  to  0,  that  either 
/(p,r)  =  g(p,r)or/(p,r)  =  g(r,p).  The  proposition  will  then  follow  from  corollary  3.3. 

Case  1:  Atleastoneof  oorpisouiside4.  Suppose  a  is  outside  A.  Then  by  Dehnition  4.1  (i), 
2a=0;  hence  <l>(2)Iix,i)  =  <b(2)[a,K)  =  0.  Consequently,  /(p.r)  =  g(r,p)  =  0  whenever 
p  re  0.  Thus  Eq.  (29)  holds  vacuously,  with 

/(P.r)  =  g(r,p)  =  0  for  all  p.r  6  R,p  leO.r  »0.  (35) 

Case  2:  Both  a  and  P  are  in  A .  Let  P  be  the  joint  density  of  (Q„,  Qp)  and  G  the  joint  density 
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of(6p.i2a)-  By  lemma  42,  f  =  G.  aearly,  then,  for  F*  the  joint  density  of  (<I>(2o).‘l>(6p))  and 
G«  the  joint  density  of  (<I>(Qp),  <I>02,J),  it  follows  that  =  G*.  Fbr  any  p,r  eR,  recall  that 
f(p,r)  is  the  density  of  the  co-occurrence  that  4>(j3)[(^r]  =  p,  and  d>(j2)[p,u]  =  r.  but  this  is 
precisely  the  density  of  the  event  that  (d>(i2o)['].d’(2p)[«l)  =  (p.r).  This  dendty,  however,  is 
equal  to  the  integral  of  F«  over  all  pairs  of  temporal  functions  (F.F)  such  that  F[l]=p  and 
FI:<]  =  r.  Similarly,  g(p,r)  is  the  density  of  the  co-occurrence  that  <l>(2)[P,r]  =  p,  and 
d>(2)[oi,r(]  =  r,  but  this  is  the  density  of  the  event  that  (<I>(2^[r],<I>(2o)[u])  =  (p.r),  which  is 
equal  to  the  integral  of  G«  over  all  pairs  of  temporal  functions  (P.R)  such  that  F(i]=p  and 
F (u]  =  r.  However,  as  we  have  already  noted,  F«  =  G»,  implying  that  f  -  g.  Apply  corollary 
3.3  to  complete  the  proof.  I 

The  proof  of  11  is  simtlar. 

The  rest  of  Section  4  is  devoted  to  showing  how  to  construct  two  kinds  of  simple  texture  quilts. 
In  Secnon  S,  we  apply  these  construcuon  techniques  m  an  experiment  to  investigate  what  sorts  of  tex¬ 
tural  characterisucs  are  actually  processed  for  motion  infonnation  by  the  visual  system. 

4.4.  Binary  texture  quilts. 

4.4.1.  A  general  technique  for  constructing  binary  texture  quilts.  The  simplest  sons  of  texture 
quilts  involve  only  two  contrast  values.  As  in  Definiuon  4.1,  let  T  =  (/ 1|  to  £  r  <  t;v )  be  the  temporal 

region  of  xtivity,  with  new  timeblocks  beginning  at  times  . tn.\.  Let  A  be  the  spatial 

regron  of  activity.  Associate  with  timeblocks  i  =  0,  U..,W-I  spatial  functions  /,  (called  limeblock 
pictures),  each  of  which  is  0  everywhere  outside  A ,  and  lakes  only  the  values  1  and  -I  within  A .  In 
addition,  associate  with  timeblocks  0  through  N-I  a  family 

i^th^i . ^K-i  (36) 

of  jointly  independent  random  vanables,  each  of  which  takes  the  value  1  or-1  with  equal  probabihty 
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Then,  fori  =0,  l....,W-l,set 

{/iUi?)  Vt  isin  dmeblock  i, 

0  cihtrmst, 

and  construct  the  random  stimulus 

®  =  ^0^0  +  +“•+ (38) 

It  is  easy  to  see  that  fi  is  a  texture  quilt  First,  the  functions  Bi  are  defined  to  satisfy  Definition 
4.1  (i)  and  (ii).  The  joint  independence  of  the  random  variables  4>,  ensures  that  B  satisfies  Definition 
4.1  (iii).  To  see  that  Definition  4.1  (iv)  is  satisfied,  note  that  for  any  a, ^€A,  either  (i) 
B,  [01,  t,  ]  =  B,  (p,  f,  ]  or  (ii)  B,  (a,  t,  1  =  -B,  (p,  (, ).  In  case  (i), 

B  (a,  (.)  =  <,,B.  (o,  q)  =  4i,B.  (P,  r, )  =  B  |P,  q ),  (39) 

unplying  that  the  pair  (B  (a,  t,  1,  B  [p,  t, ))  is  distributed  identically  to  the  p^  (B  ip,  r, ),  B  [a,  i. ))  (each 
pair  with  an  equal  probability  of  taking  the  value  (1, 1)  or  (-1,  -1)).  In  case  (ii) 

B(a,r.l  =  -B(p,(.),  (40) 

and  thepau  (S(a,  (,],B(P,(,])  is  distributed  identically  to  the  pair  (Blp,  i,],fi[ci,  r,)),  each  with  an 
equal  probability  of  assuming  the  value  (1,-1)  or  (-1,1).  Thus  Definition  4.1  (iv)  is  satisfied  along 
with  4.1  (i),  (u)  and  (iu). 

4.4.2.  Stimulus:  The  sidestepping,  randomly  contrast-reversing,  vertical  edge.  In  Fig.  4b  are 
displayed  the  9  timeblock  pictures  compnsing  a  particularly  simple  binary  texture  quilt.  Note  that  the 
vertical  dimension  of  Fig.  4b  combines  time  and  vertical  space,  precisely  as  a  strip  of  movie  film, 
scanned  vertically,  combines  time  &  space.  Tuneblock  pictures  are  separated  by  grey  fines.  Fig.  4a 
shows  the  timeblock  pictures  /  g  through  / 1  used  in  the  constnicuor..  f  g  assigns  the  value  -1  to  all 
points  (x,y)  of  the  horizontal  rectangle  comprising  the  spatial  region  of  activity,  A .  /,  assigns  1  to 
the  points  in  the  lefunosl  eighth  of  A ,  and  -1  to  the  points  in  the  right  seven  eighths.  The  timeblock 
pictures  / 2  through  /|  continue  to  shift  the  vertical  edge  rightward  through  A  unul,  in  picture  8,  A  is 
uniformly  1.  Muluplying  each  tuneblock  picture  i  =  1,2,...,9  by  its  associated  random  variable  (>. 
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yields,  in  tins  particular  realization,  the  stimulus  given  in  Hg.  4b. 

FIG  4 

The  construction  of  the  side-stepping  contrast-reversing  edge  (Fig.  4b)  is  symmetric  to  the  con- 
stniction  of  the  traveling  contrast-reversal  of  a  random  black-or-whitc  vertical  bar  pattern  (/  in  Fig. 
2a).  Transposing  the  x  and  (  dimensions  in  Fig.  4b  gives  the  xl  -cross  section  of  a  random  stimulus  J 
(eg..  Fig.  2a).  This  stimulus  exhibits  an  unusual  symmetry  between  space  and  time.  Whereas  the 
texture  quilt  of  Fig.  4b  is  microbalanced  under  all  purely  temporal  transfotmations,  its  transpose  J 
(Fig.  2b)  is  microbalanced  under  all  purely  spotted  transfoimations.  Extracting  motion  .from  J 
requires  temporal  filtenng  followed  by  a  nonlinearity.  This  process  is  essentially  different  from  the 
process  by  which  modon  is  extracted  from  texture  quilts  (e.g..  Figs.  4b,  7a,  7b  and  7c)  which  requires 
a  spatial  nonlinearity. 

4.4  J.  Stimulus:  Oppositely  oriented  static  squarewaves  selected  by  a  drifting  grating.  Figure  Sd 
shows  the  four  timcblock  pictures  comprising  another  binary  texture  quilt  constructed  usmg  techmque 
4.4.1.  In  Fig.  Sa  is  shown  a  probabilistically  defined  sinewave  grating,  a  stimulus  whose  motion  is 
readily  extracted  by  standard  motion  analysis.  In  Figs.  5bl  and  5b2  are  shown  static  vertical  and  hor¬ 
izontal  squarewave  gratings.  The  sdmulus  of  Fig.  Sc  is  obtained  by  using  Fig.  Sa  to  select  between 
the  vertical  and  horizontal  gratings  of  Figs.  Sbl  and  Sb2.  If  the  funedon  of  Fig.  Sa  is  1  at  a  certain 
point  in  space-dme,  the  corresponding  point  in  Fig.  Sc  is  assigned  the  value  of  the  coricspondmg 
point  in  Fig.  Sbl;  otherwise  the  point  in  Fig.  Sc  is  assigned  the  value  of  the  corresponding  point  in 
Fig.  Sb2.  Although  Figs.  Sc  and  Sd  look  similar,  they  differ  in  an  impoiiant  respect  the  stimulus  of 
Fig.  Sd  is  microbalanced  under  all  purely  temporal  transfotmadons,  while  that  of  Fig.  Sc  is  not  micro- 
balanced  It  is  possible  to  design  Fourier  mechanisms  to  detect  the  motion  of  Fig.  Sc,  but  not  that  of 
Fig.  Sd.  The  cndcal  difference  is  that  the  timeblock  pictures  of  Fig.  Sd  are  jomtly  independent,  while 
those  of  Fig.  Sc  are  not:  Fig.  Sd  is  obtained  by  randomly  reversing  the  contrasts  of  the  timeblock  pic- 
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F.E  4  Edw-driven  moUon  from  an  ordinary  edge  and  from  abinary  texiure  quill,  (a)  A  nghiward  moving 

bghilrk  Sgc  vUibU  «.  Fourier  and  nonFouncr  modon  sysiems.  Nine  emue  fi^es  m  ahovm  «ch 

(Lnc  consist  of  an  area  of  conlrasl  +I  and  area  of  contrast  -1.  (b)  A  realization  of  the 

dumly  eonirnsfreversing  verticnf  edge.  Ibis  random  sUmulus  is  a  texture 

under  all  purely  temporal  transformauons;  that  is.  its  rightward  motion  would  be 

motion  at^ysis  even  if  this  analysis  were  preceded  by  an  arbitrary,  purely 

frame  of  (b)  was  derived  from  the  corresponding  frame  of  (a)  by  muluplying 

variable  that  takes  the  value  1  or  -1  with  equal  probability.  The  frame  random  vanables  are  jointly 
independent.  A  straightforward  way  to  extract  the  morion  of  this  texture  quill  is  to  (i)  apply  a  linear  filler 
sensitive  to  vertical  edges,  (ii)  recufy  the  filtered  output,  and  (lii)  submit  the  result  to  standard  mouon 
analysis. 
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lures  of  Fig.  5c. 

no  5 

4.5.  Sinusoidal  texture  quilts. 

It  is  not  difficult  to.elaborate  technique  4.4.1  to  a  method  for  cqnstncung  quilts  involving  tex¬ 
tures  of  arbitrarily  manyj  contrast  values.  We  Ulustrate  the  principle  in  the  construction  of  quilts 
comprised  of  patches  of  sinusoidal  grating. 

45.1.  A  general  technique  for. constructing  sinusoidal  texture  quilts.  As  in  Definition  4.1,  let 
T  =  (/ 1  to5 1  <  tv)  be  the  temporal  region  of  activity,  with  new  timeblocks  beginnmg  at  umcs 

to,!] . l/z.i.  Let  A  be  the  spatial  region  of  activity.  Associate  with  timeblocks  i  =0. 

spatial  functions  W,.,  each  of  which  is  0  everywhere  outside  A,  and  takes  only  the  values  1  and  -1 
wiihui  A .  The  stimulus  in  each  time  block  will  be  composed  of  two  components  characterized  by 
spatial  frequencies  (o,, 9,)  and  ((0,,S,),  respectively,  and  mdependem  phases  p,,  p,, respectively. 
Let 

Ok).  0o,mo,  tio.Wi.Si.tOi.Bi . Wv-i>9v-i><0v-i,6v.i  (41) 

be  integeis.  Let  F  be  an  integer,  and  let 

Po.  Po.  Pi.  Pi.....  Pv-i.  Pv-i  (‘*2) 

be  joinUy  independent  random  variables,  each  uniformly  distributed  on  the  set  (0, 1 . F-1).  Tlien, 

define  the  stimulus  5  as  the  sum  of  N  component  stimuli  Si  defined  in  each  timeblock: 

v-i 

S  =  Z5..  («) 

where,  for  I  =  0, 1 . N-l,Si  is  zero  everywhere  outside  timeblock/;  and  for  all  i  in  timeblock  i, 

cos(2n(o),  X  +  0,  y  -  p,  )/P)  if  W,  (x ,  y )  =  1 . 

S.lx.y.t)  =/,(x.y]  =•  cos(2n(o.x+6,  y-p.)/F)  ifH',  (x,y)  =  -l.  (44) 

0  otherwise. 
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Fig.  5.  Oriemauon-driven  nonFouner  moUon  from  a  binary  lexture  quill  (a)  A  probabihsUcally  defined 
sinewave  grating  that  steps  nghtward  90  degrees  between  frames.  The  rightward  motion  in  (a)  is  accessi¬ 
ble  to  all  motion  detectors,  (bl)  Four  frames  of  a  static,  vertical  squarewave  grating,  (b2)  Four  frames  of  a 
static  horizontal  squarewave  grating,  (c)  A  nghtward  translating  teature  pauem.  Fw  every  white  point  in 
(a),  the  corresponding  value  in  (c)  is  chosen  from  the  vertical  square-wave  grating  in  (bl);  for  every  black 
pomt  in  (a),  the  corresponding  value  in  (c)  is  chosen  from  the  horizontal  square-wave  grating  in  (b2).  (c)  is 
not  microbalanced;  standard  motion-analyzers  can  be  designed  to  detect  its  motion  (d)  A  texture  quilt 
The  frames  of  (d)  are  denved  by  muluplying  the  corresponding  frames  of  (c)  by  jointly  independent  ran 
dom  variables,  each  of  which  takes  the  value  1  or  -I  with  equal  probability.  The  texture  quilt  (d)  is  micro- 
balanced  under  all  purely  temporal  transformauons,  and  therefore  its  rightward  motion  is  unavailable  to 
any  mechamsm  that  applies  standard  motion  analysis  to  a  purely  temporal  transformation  of  the  visual  sig 
nal. 
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It  is  easy  to  check  that  S  satisfies  Definition  4.1  (i)  and  (u).  The  joint  independence  of  the  ran¬ 
dom  phase  variables  p.-.p;,  fori  =  0, 1,...,//-1  entails  Definition  4.1  (iii). 

It  remains  to  check  that  S  satisfies  Definition  4.1  (iv).  Consider  points  ot,  p  6  A .  If 
W,  [a]  *  K'i(P),  then,  as  is  easily  checked,  S [ot,  i/)  and  S(p,  rj  are  independent  and  identically  distri¬ 
buted  (each  assuming  a  value  from  among  {cos(2np/i’)  ||  p  =  0, 1,...,  P-1 )  with  equal  probability). 
On  the  other  hand,  if  H',  [o]  =  W,  (P),  then  the  pair  (S  (a,  f,),  Stp,  (, ))  is  distributed  identically  to  the 
pair  (S  (p.  (,  ] .  S  (o,  q  ])  as  a  consequence  of  the  following 

Lemma.  LelP  £  Z.anil/e/o  =  (a,,a,).p=(p,,pj)ondo)=(»,,to,)iiWi>e<lemenijo/Zl  Then 
for  any  integer  p  6  {0,1,. ..J’-lj,  there  exists  an  integer  ^  e  (0.1 . P-I)  such  that  (writing  ’for 


dot  product) 

cos(2n(<i)-o-p)/P)  =  cos(2it((o-p-q)/P) 

(45) 

and 

cos(2t:(«<p-p)/P)  =  cos(2n((o-a-q)/P). 

(46) 

Proof.  As  the  reader  may  check,  this  is  true  for  q  =  (co-a  +  to-P  -  p)  modulo  P ,  I 

Thus,  for  a,  P  such  that  W',  (a]  =  IV,  IPI,  we  observe  that  for  any  outcome  p,  =  p ,  there  exists  an 

equally  likely  outcome  p,  =  q ,  such  that 

^cos(2n((i),  -a-  p)/P>.  cos(2n(«,  -P  -p)/P)j  =  ^cos(2n((i),  -p-  q)/P),  cos(2itto,  -a-  q)/P;^7) 
We  infer  that  the  parr  (S  (a,  r,  ].  S  (P,  r, ))  is  distributed  identically  to  the  pair  (S  (P.  r, ),  ,S  (a,  t,  ]). 

4.52.  Stimulus:  Oppositely  oriented  static  sinusoids  selected  by  a  drifting  grating.  The 
sinusoidal  analog  to  the  binary  texture  quill  of  Fig.  Sd  is  shown  in  Fig.  6b.  In  Fig.  6a  are  shown  the 
functions  IVi,  IVj.  Wj,  and  used  to  select  between  horizontal  and  vertical  graungs.  For  this  quilt, 
0),  =  6,  =  0,  for  i  =  1, 2, 3, 4;  and  for  some  integer  F  (with  F/P  the  number  of  cycles  per  pixel), 
to,  =0,  =F,  The  texture  quilt  of  Fig.  6b  modulates  textural  orientauon  across  space  and  time.  Alter¬ 
natively.  we  can  just  as  easily  keep  onentauon  constant  and  vary  spatial  frequency. 
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FIG  6 

4J3.  Stimulus:  Static  sinusoids  of  difTerent  spatial  frequencies,  selected  by  a  drifting  grating. 
Figure  6c  shows  a  texture  quilt  using  the  sampling  functions  of  Fig.  6a,  but  setting 
(0,  =  0;  =  25i  =  20, •  for  r  =  1, 2 . 4. 

5.  What  aspects  of  texture  does  the  visual  system  process  for  motion? 

In  this  section,  we  describe  a  psychophysical  expertment  invesugating  the  question  of  what 
characteristics  of  spatial  texture  are  analyzed  for  motion  information  by  the  visual  system.  Three  tex¬ 
ture  quilts  are  compared  across  four  different  viewing  conditions.  These  conditions  comprise  a 
sequence  of  similar,  but  increasingly  challenging  motion  discnminaiion  tasks. 

S.l.  Procedure.  Every  texture  quilt  used  in  this  experunent  is  composed  of  a  sequence  of  jointly 
independent  timeblocks,  each  lasung  1/30  sec.  (Each  timcblock  consists  of  two  identical  refreshes  at 
1/60  sec.)  Each  texture  quilt  is  stochasucally  periodic  with  a  penod  of  8  umeblocks;  that  is,  for  any 
integer  i ,  the  i  “  umeblock  is  identically  distributed  to  the  r  +  8“  timcblock.  Accordingly,  wc  refer  to 
eight  timeblocks  of  the  texture  quill  as  one  cy:le.  The  motion  chcited  by  each  quilt  is  carried  by  a 
squarewave  that  selects  between  two  textures,  and  steps  1/4  cycle  on  every  odd  timcblock  The 
squarewave  thus  completes  one  of  its  four-step  cycles  m  each  8  umeblock  cycle  of  the  quilu 

On  each  trial,  a  texture  quilt  moving  randomly  left  or  right  is  presented,  and  the  subject  is 
required  to  signal  (with  a  button-press)  which  way  the  quilt  appeared  U)  move.  The  subject  is  asked 
to  maintain  fixation  on  a  small  spot  present  in  the  middle  of  the  stimulus  throughout  the  display,  and 
receives  feedback  after  each  trial.  For  each  quilt  under  each  viewing  condiuon,  the  subject  performs 
100  pracuce  trials  followed  directly  by  100  actual  mats  Quilt  realizations  are  jointly  independent 
across  dials.  The  starung  phase  of  the  quilt  is  chosen  randomly  on  each  trial. 

The  four  viewing  conditions  For  a  given  quilt,  the  four  viewing  condiuons  differ  with  respect 
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Fig  6  Sinusoidal  lexture  quills:  Mouon  dnven  by  dilTetences  in  orienmtwn  and  in  spatial  frequency,  (b) 
and  (c)  show  realizations  of  random  stimuli,  each  of  which  is  microbalanccd  under  all  purely  tempoial 
iransfoimations.  Their  rightward  mouon  cannot  be  detected  by  any  mechanism  that  applies  standard 
motion  analysis  to  a  purely  tempoial  transfoimauon  of  the  signal.  In  each  case,  the  4  frames  in  (a)  select 
between  two  sinusoidal  patterns.  Ttie  phases  of  sinusoids  are  joindy  independent  across  frames  and  across 
different-frequency  smusoidal  components  patched  together  in  the  same  frame.  The  sinusoids  mixed  in  (b) 
differ  in  oneniaiion,  whereas  the  sinusoids  mixed  in  (c)  have  the  same  onentalion,  but  differ  m  spaual  fre¬ 
quency. 
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tad>:ostoofq;^cj-ds£sp22j-SiL  laCerrSaan  l.Caezsigtcn.'SSan.gegaijiasastgo 
qaltcycbsCexhcytlscocqsBidofi^as&ssfaiiiaatocl'sy.caieaJu^rvWfictiSgiqrdftr 
l/50stc.IaCoo3itioos2.3,aal4.ih!iiotjKtcisl.S.l.a3a.5<gs!teyda.r;^)eaiT^. 

5-I.1.  TlirM  qniU  stiinuC.  Tbe  £ra  q:^  (fit  F-qsih)  racifebyi  ts!S2l  bapacy  iz  a 
fraction  of  ^sxeaad  this.  aiiilsi^fqttssorisisiaacaQsaaL  'nedgfcitmrtibciscccj'tBingooc 
fiillcj'deoflhcF-qsillajsdaviiaxaFig.Ta.  Asct03dq5at(theO-qcill.r5g.7b)iao(la1g«tcnTra1 
oricoiaiioa  as  a  fcnciioa  of  space  and  fee,  laepag  sp3ti?J  fegneoey  coostam.  Ai!iiidqi2l 
(the  E-quilt,  Hg.  7c)  spntiotcnipanlly  taoiaSucs  ttxtat  between  jtKstly  aitperdna  binsy  nc^ 
and  the  so^alle/*  "ei-en"  texture  0ulea.  Gilbert  &  Victor.  197^. 

All  stimuli  were  viewed  from  I  magsna  a  tneaslusiinantbaclsrocnd.  At  this  iSstance,  cadi 
quilt  spanned  6.8  horizontal  and  3.2  venical  degrees,  and  tbe  modulating  square  wave  moved  at  an 
average  velocity  of  12.75  degfsee. 


nG7 


5.12.  Why  these  three  quilts.  In  each  of  the  three  quilts,  a  squarewave  with  vertical  bars  is  used  to 
modulate  between  two  textures  as  a  function  of  space  and  time.  The  squarewave  has  a  spatial  fre¬ 
quency  of.3  c/deg.,  and  steps  1/4  cyele  rightward  on  every  odd  timcbloel:  (temporal  frequency 
3.75  Hz .  velocity  17-75  deg/sec).  We  use  a  l/4<ycle  stqrping  squarewave  to  modulate  between  the 
two  textures  comprising  each  quilt  in  order  to  rule  out  the  possibibty  that  the  motion  elicited  by  the 
quilt  is  being  carried  by  the  border  between  textural  regions.  That  is,  the  1/4-cycle  stepping 
squarewave  has  the  advantage  that  the  signal  derived  from  tbe  borders  between  texnire  regions  is 
ambiguous  in  motion  content.  Given  the  requirement  of  1/4  cycle  steps,  we  changed  the  particular 
instantiation  of  the  quilt  on  even  timeblocks  (i  e.,  within  steps  of  the  squarewave)  in  order  to  spread 
textural  energy  broadly  m  temporal  frequency  without  altering  the  spatial  frequency  content  of  the 
texture. 
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Fi£.  7.  Three  quills  used  lo  study  motion  carried  by  modulation  of  texture  spatial  frequency ,  by 
oneniauon,  and  by  higher  order  textural  charaaetistics.  (a)  Eight  fiames  tlut  compnse  on^ycle  of  the  F- 

quilL  Mouon  is  generated  by  by  a  squarc»ave  modulation  of  textural  ^tial  frequency,  ’^e  squarenw  , 

grating  selects  between  vertical  sinusoidal  gratings  of  spatial  frcquoicy  12  c/deg  a^  2.4  ^deg.  The  j 

Lture-modulaung  squarewave  is  0.3  c/deg.  and  steps  1/4  cycle  rightward  on  every  odd  frame  Every  even 
frame  is  independent  of  and  distributed  idenucally  to  the  preceding  frame.  P^lauon  P«>^  ™ 

rate  of  30  franies/sec.  This  gues  the  texture-modulating  squarewave  a  temporal  frequency  of  3.75  m  and 

a  a  mean  velocity  of  25  deg/sec.  ^ 

(b)  Eight  frames  that  comprise  once  cycle  of  the  0-quilL  In  the  O^uill.  tex^  orien^on  is  modulated 

by  the  same  squarewave  used  to  modulated  spatial  frequency  in  the  F-quilu  The  squarewave 

selects  between  OK>osiicly  onenicd  sinusoidal  gratings  that  have  a  spatial  frequency  of  2.8  c/deg. 

(c)  Eight  frames  that  comprise  once  cycle  of  the  E^quili.  In  the  E-quilt,  the  texture-modulating  squarewave 
selects  between  jomtiy  independent  bmary  noise  and  an  'even'  texture  (Julcsz,  Gilbert  &  Victor.  1978) 

Despite  the  evident  difference  between  these  two  textures,  every  ume-indepcndeni  linear  filler  h^  the 
same  expected  power  for  both  textures  Thus,  if  moiion-from-tcxiure  resulted  from  applying  a  simple 
squaring  transformation  to  the  output  of  a  spatial  linear  filter  and  submilung  the  result  to  standard  motion 
analysis,  ihe  mouon  of  ihc  E^quili  would  be  invisible. 
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It  h2S  best  {ZEviacdy  obsztvsi  (Wzsca  £  Abzssdi.  1933a;  Rana:hrx1iz3.  Cstdctg  £ 
Aasas.  1933:  Grsia.l9Sgiha  130000  geznisd  rage  egjohdy  by  gaiigCiayaalt^siaagurf'ttx- 
tsal^Btalfrgqoggy  teabyz-anaanoflsacialorinTPiioo.  TbeF-qsi]t2adO-qoili«'Ci£d]osai 

inftrohg  fca-ggjggglhkrfaini  Thi-Fwpnik«ftrT»T!-q»ir!-j;;c-lh»ra,'ntn1i-rf<nf»-h^itKrnrTi- 

posod  (jocsdy  iadqiBvfaiT  Isiaiy  ooiss  aod  evta  taotse)  have  iiknacal  secood  (sdcr  gaisocs. 
That  is,  the  distribotioa  of  aay  givxa  jes  of  poiais  in  spaK  is  ib:  sang  cadcr  bodi  iuo  cocn- 
pooaiicxnEcsoftbsE-qmlL  Ibis  msaas  that,  de^^tbcobviocsdiffeitnce  in  tppeamcebetik'ctn 
ih:  coraponaa  taunres,  ibo  expeciod  enagy  in  ih:  re^oos:  of  any  givai  spatial  linear  filter  is  the 
same  fg  bodi  component  textures.  If  the  poinrxlse  nonlinearity  applied  to  the  output  of  the  spatial 
linear  filta  poor  to  motion  analysis  were  simple  squanng,  it  would  be  impossible  to  detect  the  motion 
of  the  E.quilL 

Victor  and  Conte  (1990)  studied  apparent  motion  elicited  by  E-quilts,  and  noted  that  it  is  much 
weaha  than  motion  elicited  b>'  comparable  stimuli  (also  texture  quilts)  that  modulate  between  tex¬ 
tures  differing  in  spatial  frequency.  Our  experiment  confirms  this  finding. 

52.  Results  Two  subjects  panicipaied  in  the  gudy,  CC  (the  experimenta)  and  GA  (naise).  The 
results  for  CC  me  shoitm  in  Fig.  8  bund  those  for  GA  are  sbotun  in  Fig.  Sri.  Note  first  that  both  sub¬ 
jects  wae  able  to  reliably  discnminate  left/right  motion  in  all  three  stimuli  although  subject  GA  failed 
with  the  E-quilt  at  the  briefest  exposure.  The  two  subjects  performed  comparably  well  at  mouon 
duection  disenminauon  of  the  0-quiIt,  but  CC  was  much  bella  than  GA  at  detecting  the  motion  of 
both  the  F-quilt  and  the  E-quilt.  Subject  CC  was  better  at  detecting  the  motion  of  the  F-quilt  than  the 
O-quilu  the  reverse  was  true  of  subject  GA. 

It  IS  possible  that  these  performance  differences  leflat  a  genuine  differences  in  the  perceptual 
apparatus  of  the  two  subjects.  However,  we  cannot  rule  out  the  possibility  that  the  better  perfoimance 
of  subject  CC  IS  due  merely  to  his  vastly  greater  expenence  with  mouon  perception  tasks  of  this  sort. 
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Stimulus  Duration  (msec) 


Fig.  8.  The  percent  of  cottect  dirccuon-of-mouon  judgments  to  Uie  F-quilt,  the  0-quill,  and  the  &quilt  as 
a  function  of  sumulus  durauon.  The  panels  show  data  for  subjects  CC  and  GA,  rt^nvely  Each  data 
point  IS  the  mean  of  100  judgments.  (Squares)  F-quilt.  (mangles)  O  quilt,  (circles)  &quilt.  The  stimulus 
durations  of  133,  266,  400,  and  533  ms,  correspond  to  stimulus  presentations  of  0.5,  1,  1.5  and  2  quilt 
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53.  Dtscussion.  Mmy  of  the  models  ^uoposed  to  expbin  it^d.  pieaneolive  segregation  of  qalial 
textures  (CaeUi,  1985;  Beck,  Suiter  &  hry,  1937;  Sutter.  Beck  &  Graham,  1989;  Bergen  &  Adelson, 
1938;  Malik  &  Ferona,  1989)  can  easily  be  adapted  to  deal  with  the  motion  di^layed  by  texture 
quilts.  The  texture  s^rcgaiion  models  in  this  class  typically  subject  the  visual  input  function  to  a 
liruar  transformation  (a  'texture  gibber')  followed  by  a  pointwise  nonlinearity  (such  as  a  rectifier  or 
thresholder)  to  indicate  the  pruence  or  absence  of  the  texture.  Such  models  propose  that  two  con- 
riguous  textural  regions  would  generate  a  perceptual  boundary  if  the  visual  system  were  equipped 
with  a  linear  filter  that  is  differentially  otned  to  one  of  the  textures. 

An  analogous  mechanism  to  delect  the  motion  of  texture  quilts,  suggested  by  the  currenf  experi¬ 
ment  and  the  work  of  Victor  and  Conte  (1990),  (i)  convolves  the  input  stimulus  with  a  spatial 
texture-grabbing  filter  mned  to  the  moving  texture,  then  (ii)  squares  the  output  of  the  filter,  to 
transform  regions  of  high-energy  filter  output  into  re^ons  of  high  average  value,  and  (iii)  subjects  the 
rectified  output  to  standard  motion  analysis.  However,  the  transformation  applied  in  steps  C>)  and  GO 
docs  not  distinguish  between  the  two  textures  comprising  the  E-quilt,  and  therefore  fails  to  account 
for  the  good  performance  with  the  E-quilt.  A  simple  modification  to  deal  with  texture  segregation 
and  mouon  percepuon  of  the  E-quili  is  to  assume  some  other  post-filter  rectification  operation  than 
the  squaring  operation.  It  is  qiuie  easy  to  choose  a  linear  filter  in  combination  with  a  post-filter 
rectifier  (other  than  the  squaring  operation)  that  will  segregate  the  random  and  even  textures  (e.g., 
Juiesz  &  Bergen,  1983).  The  current  experiment  docs  not  specifically  indicate  the  kind  of 
rectification  that  might  be  involved. 

What  sorts  of  fillers  are  available  to  the  visual  system  to  compute  motion  from  texture?  For 
example,  Daugman  (1985)  points  out  that  (i)  Gabor  fillers  provide  an  optimal  trade-off  between  reso- 
luuon  m  the  space  and  spaual  frequency  domains,  and  (ii)  many  investigators  note  that  simple  cells  in 
cat  striate  cortex  are  well-modeled  by  oriented  Gabor  filters  (eg.,  Wilson  &  Sherman,  1976; 
DeValois,  DeValois  &  Yund,  1979;  Andrews  &  Pollen,  1979).  Are  the  hnear  filters  that  serve 
mouon-from-texiure  computations  Gabor-like  cortical  simple  cells?  The  theory  reported  here 
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provides  a  tool,  and  the  demonstradon  expetiinenis  illustrate  how  it  might  be  used  to  answer  such 
questions. 

6.  Summary. 

The  main  contributions  of  this  paper  are  C>)  to  introduce  the  notion  of  a  random  stimulus  nucro- 
balanced  under  all  poimwise  tran^armaiions,  (ii)  to  provide  necessary  and  sufficient  conditions  for  a 
random  stimulus  to  be  of  this  sort,  (iii)  to  use  this  result  to  construct  apparent  motion  stimuli  called 
lejaure  quills  that  are  microbalanced  under  all  purely  temporal  transformations,  and  (iv)  to  show  that 
subjects  can  reliably  discriminate  the  motion  direcu'on  of  three  lands  of  texture  quilts. 

Texture  quills  provide  a  flexible  array  of  tools  for  smdying  motion  perception  that  is  Duly  medi¬ 
ated  by  spaiiotemporal  modulation  of  spatial  texture  without  contamination  by  mechanisms  respon¬ 
sive  to  the  motion  exoacted  directly  by  standard  analysis  or  motion  exDacied  by  standard  analysis  of 
any  purely  temporal  transformation  of  the  so'mulus. 
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Figure  legends 

Fig.  1 .  The  Reichardt  detector.  Let  /  be  a  random  stimulus.  Then,  in  response  to  / ,  for  i  =  1 , 2,  the  box 
containing  the  spatial  function /;  :Z^ -» R,  outputs  the  temporal  function,  2  0:  each  of 

the  boxes  marked  outputs  the  convolution  of  its  input  with  the  temporal  function  g,  ;Z  ->  R;  each  of 
the  boxes  marked  with  a  multiplication  sign  outputs  the  product  of  its  inputs;  the  box  marked  with  a  minus 
sign  outputs  its  left  input  minus  its  right;  and  the  box  containing  ht  outputs  the  convolution  of  its  input 
with  the  temporal  function  h ;  Z  -»  R.  To  see  how  the  Reichardt  detector  senses  motion,  suppose  /j  is 
identical  to/g,  but  shifted  in  space  by  some  offset,  and  suppose  the  filters  gg*  do  not  alter  their  input, 
while  the  filters  gj*  simply  delay  their  input  by  some  amount  S,  of  lime.  Then  a  rigidly  transbnng  pattern 
moving  in  the  direcuon  of  box /j’s  offset  from  box/g  will  elicit  some  time-varying  response  from  box  /  g, 
and  the  same  response  a  short  ume  later  from  box  /  j.  If  that  "short  time  later”  is  precisely  8, ,  the  output  of 
the  nghthand  multiplier  will  be  posiuve  as  long  as  the  pattern  keeps  drifting.  This  will  result  in  a  net  nega¬ 
tive  Reichardt  detector  output  If  the  pattern  drift  is  in  the  opposite  direcuon,  the  detector  response  will  be 
posiuve. 

Fig.  2.  Exposing  the  motion  of  the  traveling  contrast-reversal  of  the  random  black-or-white  vcmcal  bar 
pattern  J  to  standard  mouon-analysis.  (a)  An  a  cross-scction  ofJ.  (b)  An  xr  cross-secuon  of  the  partial 
denvauve  of  J  with  respect  to  time,  (c)  An  xr  cross-section  of  |3/ /3i|.  Each  of  J  and  37/3;  is  microbal- 
anced  However,  |37/3l|  is  not.  In  particular,  |d7/3r|  has  most  of  its  energy  at  those  frequencies  whose 
velocity  is  equal  to  the  velocity  of  the  travehng  contrast-reversal. 

Fig.  3.  Fourier  and  nonFourier  mouon  mechanisms,  (a)  Fourier  mouon  mechanisms  apply  standard 
motion-analysis  duectly  to  the  luminance  signal  L.  (b.  c.  d)  NonFouner  mechanisms  apply  standard 
motion  analysis  to  a  nonlinear  transformauon  of  lummancc.  (b)  A  simple  nonFouner  mechanism  applies  a 
signal  transformauon  comprised  of  a  spaUotemporal  Imear  filler,  followed  by  a  pointwise  nonlineanty.  The 
* ’s  indicate  spaual  and  temporal  convoluuon,  respecuvely,  and  •  mdicates  muluplicauon.  The  filtermg 
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perfonned  in  (b)  is  roughly  pointwise  in  tune  (the  temporal  impulse  response  b2  approximates  an  impulse), 
and  the  nonlinearity  applied  is  a  fuU-wave  rectifier.  This  system  (with  tgjpropriaiely  chosen  spatial  filter, 
bl)  will  extract  the  motion  of  the  texture  quilts  shown  in  Figs.  4b,  Sd,  6c,  and  6d.  It  will  not  extract  the 
motion  of  stimulus  /,  the  traveling  contrast-reversal  of  the  random  vertical  bar  pattern  shown  in  Fig.  2a. 
(c)  A  spaually  pointwise  (the  spatial  impulse  response  cl  approximates  an  impulse),  system  with  a  llicker- 
sensiuve  temporal  filter  and  a  full-wave  rectifier.  Because  of  the  flicker  sensitivity,  this  mechanism  will 
exuact  the  motion  of  the  travehng  contrast-reversal  of  the  random  vertical  bar  pattern  shown  in  Fig.  2a  but 
not  the  motion  of  the  texture  quilts  shown  in  Figs.  4b,  5d,  6c.  and  6d.  (d)  The  temporal  filter  d2  averages 
the  temporal  fillers  b2  and  c2,  and  the  pomtwise  nonlinearity  is  a  full-wave  rectifier.  With  an  appropriate 
spaua!  filter  dl,  ths  nonFouner  system  extracts  the  mouon  of  any  coiresponding  texture  quilt  as  well  as  the 
mouon  of  the  travehng  contrast-reversal  of  the  random  vertical  bar  pattern  shown  in  Fig.  2a.  However,  it 
would  be  less  well-suited  to  these  tasks  than  the  detectors  shown  in  (b)  and  (c)  whose  temporal  filters  it 
averages. 

Fig.  4.  Edge-dnven  mouon  from  an  orduiary  edge  and  from  a  binary  texture  quilt,  (a)  A  nghiward  movmg 
light-dark  edge  visible  to  Founer  and  nonFourier  motion  sy.ciems.  Nine  entire  frames  are  shown,  each 
frame  consists  of  an  area  of  contrast  +1  and  area  of  contrast  -1.  (b)  A  realization  of  the  sidestepping,  ran¬ 
domly  contrast-reversing  vertical  edge.  This  random  stimulus  is  a  texture  quill  and  hence  microbalanced 
under  all  purely  temporal  iransfotmauons.  that  is.  ns  nghtward  mouon  would  be  inaccessible  to  standard 
mouon  analysis  even  if  this  analysis  were  preceded  by  an  arbitrary,  purely  temporal  transformauon.  Each 
frame  of  (b)  was  denved  from  the  corresponding  frame  of  (a)  by  multiplying  the  entire  frame  by  a  random 
vanable  Uiat  takes  the  value  1  or  -1  with  equal  probability.  The  frame  random  variables  are  jointly 
independent,  A  straightforward  way  to  extract  the  mouon  of  this  texture  quilt  is  to  (i)  apply  a  hnea.  filter 
sensiuve  to  verucal  edges,  (li)  recufy  the  filtered  output,  and  (iii)  submit  the  result  to  standard  mouon 
analysis. 
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Fig.  5.  Orientation-driven  nonFourier  motion  from  a  binaiy  texture  quilL  (a)  A  probabilistically  defined 
sinewave  granng  that  steps  nghtward  90  degrees  between  frames.  The  rightward  motion  in  (a)  is  accessi¬ 
ble  to  all  mouon  detectors,  (bl)  Four  frames  of  a  static,  vertical  squarewave  grating;  (b2)  Four  frames  of  a 
static  horizontal  squarewave  grating,  (c)  A  rightward  translating  texture  pattern.  Fot  every  white  point  in 
(a),  the  corresponding  value  in  (c)  is  chosen  from  the  vertical  square-wave  grating  in  (bl);  for  every  black 
pomt  in  (a),  the  corresponding  value  in  (c)  is  chosen  from  the  horizontal  square-wave  grating  in  (b2).  (c)  is 
not  microbalanced;  standard  motion-analyzers  can  be  designed  to  detect  its  motion,  (d)  A  texture  quilt. 
The  frames  of  (d)  are  denved  by  muluplymg  the  corresponding  frames  of  (c)  by  jointly  independent  ran¬ 
dom  variables,  each  of  which  lakes  the  value  1  or  -1  with  equal  probabihty.  The  texture  quilt  (d)  is  micro- 
balanced  under  all  purely  temporal  transformauons,  and  therefore  its  nghtward  motion  is  unavailable  to 
any  mechanism  that  applies  standard  mouon  analysis  to  a  purely  temporal  transfonnation  of  the  visual  sig¬ 
nal. 

Fig.  6.  Sinusoidal  texture  quilts.  Mouon  dnven  by  differences  m  oneniaiwn  and  in  spatial  frequncy  (b) 
and  (c)  show  realizations  of  random  sumuli,  each  of  which  is  microbalanced  under  all  purely  temporal 
uansfoimauons.  Their  nghtward  motion  cannot  be  detected  by  any  mechanism  that  applies  standard 
motion  analysis  to  a  purely  temporal  transfoimauon  of  the  signal.  In  each  case,  the  4  frames  in  (a)  select 
t-.iv’cen  two  sinusoidal  patterns.  The  phases  of  smusoids  are  jointly  independent  across  frames  and  across 
different-frequency  sinusoidal  components  patched  together  in  the  same  frame.  The  sinusoids  mixed  m  (b) 
differ  in  onentation.  whereas  the  sinusoids  mixed  in  (c)  have  the  same  onentauon.  but  differ  in  spaual  fre¬ 
quency. 

Fig.  7.  Three  quilts  used  to  study  mouon  earned  by  modulation  of  texture  spatial  frequency,  by  texture 
onencUon,  and  by  higher  Older  textural  charactensucs.  (a)  Eight  frames  that  compose  one  cycle  of  the  F- 
quilu  Motion  is  generated  by  by  a  squarewave  modulation  of  textural  spaual  frequency.  The  squarewave 
grating  selects  between  verucal  smusoidal  gratings  of  spaual  frequency  1.2  c/deg  and  2.4  c/deg  The 
lexture-modulaung  squarewave  is  0.3  c/deg.  and  steps  W  cycle  nghtward  on  every  odd  frame.  Every  even 
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a  a  mean  velocity  of  25  deg/sec. 

(b)  Eight  frames  that  comprise  once  cycle  of  the  O-quilL  In  the  O-quilt,  textural  orientation  is  modulated 
by  the  same  squarewave  used  to  modulated  spatial  frequency  in  the  F-qmlL  The  0-quiIt  squarewave 
selects  between  oppositely  oriented  sinusoidal  graungs  that  have  a  spatial  frequency  of  2.8  c/deg. 

(c)  Eight  frames  that  compnse  once  cycle  of  the  E-quilt.  In  the  E-quilt,  the  texiure-modubnng  squarewave 
selects  between  jouitly  independent  binary  noise  and  an  "even’  texture  <JuIesz,  Gilbert  &  Victor,  1978) 
Despite  the  evident  difference  between  these  two  textures,  every  time-independent  linear  filter  has  the 
same  expected  power  for  both  textures.  Thus,  if  molion-from-texture  resulted  from  applying  a  simple 
squaring  transformation  to  the  output  of  a  spatial  bnear  filter  and  submitting  the  result  to  standard  motion 
analysis,  the  motion  of  the  E-quilt  would  be  invisible. 

Fig.  8.  The  percent  of  correct  diieeuon-of-molion  judgments  to  the  F-quilt,  the  O-quilt,  and  the  E-quill  as 
a  function  of  sumulus  durauon.  The  panels  show  data  for  subjects  CC  and  GA,  respectively.  Each  data 
point  IS  the  mean  of  100  judgments.  (Squares)  F-quilt;  (mangles)  O-quilt;  (circles)  E-quilh  The  stimulus 
durauons  of  133,  266,  400,  and  533  ms,  correspond  to  stimulus  presentations  of  0.5,  I,  I.S  and  2  quilt 
cycles. 
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£t»v:x  Hcc  preaesi  ac«  dt:a  ocasaris^  t»o  propenio  of  cbo  uxoed-isxit 
filvn:  tSesr  eecs^u  taodsUsoo  sesusrify  aa  a  foocooo  of  apasal  frgfjeacy 
(NfTf).  and  tbe  rcUsoa  of  is^  yasal  £lans{  lo  locood-gafc  aeleccvu)  To 
dsenaioe  the  M7F.  ne  tssed  a  sturcaae  procedst  to  obcoo  aaylavde 
aodoU:.oo  thrcibolda  for  the  dctccooo  of  the  oriesuaoe  of  Cahoc  cxrf.-Jiaoa 
of  a  btodbssud  aotte  ccna.  We  csed  isfcoved  ootse  camsa  «i:h  a  eano»tr 
b&od«id;h  thu  the  acs:*!i  ttponed  Uat  year.  Four  carrier  fc4oda  were  created 
wiO)  cer.'xr  frcq*jei)de»  of  2. 4.  S.  and  16  cA3e^  Ibe  ytsal  f^ucscy  of  the  test 
ti{iuh  (Gabor  as^bnide  aodaUboRr)  rasped  fna  0.5  to  8  cfdef 
The  ifftproveoeau  in  <X7  tanadi  produced  a  ddfereet  pasen>  of  rudu  (I) 
The  threshold  taybftide  of  tipul  oodaliaoo  was  loweai  for  05  and  \X>  ^dep 
Above  IX)  e/dep  threshold  usseased  with  froqueacy*.  Q)  There  was  a 
srpuficanc  io»acsoa  of  ca.’ner  freqaeocy  ba.nd  with  the  isodtiU:»{  &eqjeocy, 
with  the  kmest  thresholds  occuns;  for  canier  ftcqoencyAaoddanoo  frequency 
ratios  of  about  three  to  four  ocuves  These  results  indicate  iha:  the  second-su{e 
selective  frlters  and  detectors  are  loost  sensiave  to  frequencies  lower  than  or  equal 
to  1  c/def,  and  that  they  are  selecove  with  reprd  u>  the  spacal  frequency  coctent 
of  the  earner  noise  on  which  (he  sifnals  are  unpressed. 
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We  cxnaasJ  nvaiioa  csrritd  textcral  propcnitt  Tbr  ut 

tsKd  ccscstgj  of  par^bei  of  sxctfSttdal  of  vsnms  (pads! 

f^,:sncks  tod  coo&asti.  Phtses  «-ere  nendoo^zsd  to  xnscre  dut  mo:io» 
xxtbirasms  sensidve  to  correspondences  in  sds^dus  Icmintnce  A-ete  no: 
sjiienudcilly  eo^tfed. 

We  csed  «n  A^nU^oocs  t^oreni  cx>nos  ptradigm  in  uhich  t 
*beterogeseocs*  inotim  ptd>  (dcMed  b>*  tltemtnn^  patches  of  «  i)pc  A 
a  t>pe  B  teinffc)  eoe^tes  uith  t  *hoR)o^eneoss*  motion  padi  ^fbed 
b)  pa*ilKS  of  t>7e  A  We  found  that  the  stresfth  of  these  <2nd  order) 
modon  tdmch  is  determined  b)'  the  co^ana.'^e  of  the  orrnir>  of  the 
teitures  that  debrse  the  modon  paths  The  acOMt)  of  a  texture  is  an 
b)70tHeslzed  propeny  that  b  propomonaJ  to  the  lexture*t  cor.c^asi  tstS  is 
fosnd  10  be  inversely  propordonal  to  its  mdal  frequency  (within  the  range 
of  ^tbl  frequencies  exandned)  Irtdeed,  beterogerseoos  idodoo  betueer) 
equal  consist  patches  of  a  high  spadal-frequency  texture  A  and  a  low- 
^dal  frequency  texture  B  can  easily  do^roie  bomogensous  motion 
beiueen  t>vo  patches  of  A  because  the  activity  of  lexicre  B  b  higher  than 
that  of  texture  A 

At  temporal  frequertcies  higher  than  4  Ha,  we  6rtd  that  activity 
covariance  almost  exclusively  determines  modon  stmrgth  At  lower 
lemp^  frequencies,  timilanty  between  textures  becomes  a  significant 
facto;  as  well. 

SuppoRadbyAPOSKLilcScKACcs  VtfulL^orswuonftsccuejPnspiA.G'AXtS-Otxy 
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CAN  UXSEE  2nd  ORD£R  MOTION  AND  TEXTIREINTHEPCRIPMERT  ? 

JaikjaA  SoloKMi  ind  Gtoftt  Spedvtf, 

Hurtan  hfcrsidoa  PrDCt»tr.S  Liijontcry.  New  Yotl  Unneaj^ 

St-jrjJi.  Our  Ist-order  sornub  »:c  cnovirf  sine  srattn^s  Our  22t!-onia  stunub 
are  pauhei  of  sudc  viiual  ooi»e,  «bose  consasu  arc  modulated  by  moving  sine 
gnungs.  Neither  the  ^dal  onenadoo  nor  the  direction  of  modoo  of  these  2nd' 
order  (dnft-balanced)  scraub  can  be  detected  b)  analysis  of  their  Fourier  domaLn 
pov^er  spectra.  Theya.*eirtvisiUctoReichardtandmoooA-energyd£tsciocs. 

Method,  for  these  dycamicsdmuli.  in  the  fo>ea,  arid  at  lidegeccenstciQ'.  ue 
measured  consast  moduladon  thresholds  as  a  funccoo  of  spaaal  frequency  for 
discncurudoo  of  ±45  deg  texture  slant  and  for  discriminaoon  of  dutcnon  of 
modoA.  Spanal  frequency  was  s  an’ed  by  changing  viewing  distance. 

Results,  for  suffKiently  low  spatial  frequencies  and  sufSciently  targe  contrast 
modulations,  all  snmub  are  visible  both  foveaUy  and  peripherally  For  peripherally 
viewed  Ist-order  graongv,  the  highest  spaaal  frequency  at  which  motion  or  texture 
discnminacoo  is  possible  is  about  1/4  that  at  wh.ch  the  corresponding 
discriminasoQ  is  possible  for  foveally  viewed  gratings  For  per.phcra]ly  riewed 
2nd-crder  gratings,  the  highest  spatial  frequencies  at  which  motion  or  texture 
discnminadon  are  possible  are  somewhat  less  than  1/4  the  frequenacs  of  the 
corresponding  fo>ea]  discrinunati<Kis  Ibus,  as  the  sdmulus  nio>es  peripherally, 
the  visual  mechanisms  that  detect  2nd  order  motion  and  texture  Iom  sensitivity 
somewhat  faster  than  the  Isi-order  mechanisms. 

Conelusiors  Under  certain  ^lecific  assu.*nptionSv  our  results  suggest  the 
following  about  the  neural  detectors  involved  in  these  discnnunanons'  (!)  For  both 
mooon  and  texture,  there  are  more  fovea!  than  peripheral  detectors  at  all  spanal 
frequencies.  (2)  There  are  more  Isi-orda  than  2nd-order  detectors  (3)  On  the 
aierage.  foveal  detecters  respond  to  higher  ^tial  frequencies  than  peripheral 
detectors  (4)  The  2nd-order  foveal-penphera!  spadiJ  frequency  difference  is 
somewhat  larger  than  the  ist-order  difference. 

Sifpu'tedby  AF05R  Life  Sooves,  ViutJ  lofonraiice  ProcesangProsm.Grm  SS-OtaO. 
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OBJECT  SPATIAL  FREQUENCIES,  RETINAL  SPATIAL 
FREQUENCIES,  NOISE.  AND  THE  EFFICIENCY  OF 
LETTER  DISCRIMINATION 

Damd  H.  Pajosh  and  GeoxceSperusc* 

Kunun  Iftfon:u!ioa  Processing  Labofato:>.  Departtnent  of  f^}xho!og>  and  Center  for  Nnsral  Sciences. 

Neu  York  Ucis«$ii>.  10003.  U.S^. 

(Rettttfd  7  July  t9SS.  tn  tatsfd  form  2  Juno  1990) 

Abstract— To  detemtine  «hich  spatial  frequencies  are  most  cflecthe  for  letter  identification,  and  wliether 
this  ts  because  letters  are  objectixel)  more  discnmina^  in  these  frequenc)  bands  or  because  can  uulue 
the  information  more  efhciectl>,  u«  studied  the  26  upper><aseleftersof  English  Six  tixcyociavc  uide  filters 
«ere  used  to  produce  spaiial]>  filtered  tetters  unh  2I^mean  frequencies  ranging  from  0  4  to  20  c\des  per 
letter  height  Subjects  attempted  to  identify  filtered  letters  tn  the  presence  of  identicaUv  filtered,  added 
Gaussian  noise  The  percent  of  correct  letter  identifications  vs  sin  (the  rooi-mean-squa*  -atio  of  signal 
to  noise  poutr)  uas  determined  for  each  band  at  four  vicumg  distances  ranpng  over  32  I  Object  spatial 
frequenev  band  and  s',n  determine  ptnenee  of  tnfottnauon  tn  the  stimulus  viewing  distance  determines 
retinal  spatial  frequenev.  and  affects  onlv  ahtm  to  uuUzt  Viewing  distance  had  no  effect  upon  letter 
discnminabi!ii>  object  spatial  frequenej  not  retinal  spatial  frequenej  determined  divcnminabilitv  To 
determine  discnmination  efficicnev  ue  compared  human  disenmination  to  an  ideal  disetiminaior  For  our 
two-octave  wide  bands,  r  n  performance  of  humans  and  of  the  ideal  detector  improved  with  freq>.e'icv 
mainl)  because  linear  bandwidth  increased  as  a  function  of  frequenev  Rclaiive  to  the  ideal  detector, 
human  elfioencv  was  0  in  the  lowest  frequenev  bands  reached  a  maximum  of  0  42  at  1  5  cvcles  per  object 
ar.d  dropped  to  about  0  104  m  the  highest  band  Thuv  our  subjects  best  extract  upper<ave  letter 
information  from  spatial  frequencies  of  1  5  cvcies  per  object  height  and  thev  can  extract  ti  with  equal 
efhciencv  over  a  ^2  I  range  of  retinal  frequenae>  from  0074  to  more  than  2  .^c>cle>  pet  degree  of  visual 
argie 

Spatial  filtering  Sca’e  invariance  Pwchophwicv  Contravi  seP'itiviiv  Acuitv 


IVTRODLCTION 

Characterizing  objects 

When  we  mcvn  objects,  what  range  of  spatial 
frequencies  is  critical  for  recognition,  and  how 
IS  our  visual  sssicm  adapted  to  perceive  these 
frequencies'’  Ginsburg  (1978,  1980)  was  among 
the  first  to  investigate  this  problem  b)  means  of 
spatial  bandpass  filtered  images  of  faces  and 
low  pass  filtered  images  of  letteis  He  noted  the 
lowest  frequenc)  band  for  faces  and  the  cutoff 
frequenc)  for  letters  at  which  the  images  seemed 
to  him  to  be  clcarK  recognizable  The  cutoff 
frequency  for  ieiicrs  was  1-2  cycles  per  letter 
width,  faces  were  best  recognized  in  a  band 
centered  at  4  cycles  per  face  width.  He  also 
proposed  that  the  perception  of  geometric  visual 
illusions,  such  as  the  Mucilcr-Lycr  and  Poggen- 
dorf,  was  mediated  b)  low  spatial  frequencies 
(Ginsberg.  1971.  1978.  Ginsberg  &  Evans. 
1979) 


•To  whom  reprini  requevtv  shou’d  be  addti"Cd 


An  issue  that  is  related  to  the  lowest  fre¬ 
quency  band  that  suffices  for  recognition  is  the 
encoding  economy  of  a  band  For  a  filter  with 
a  bi  'dwidih  that  is  proportional  to  frequency 
(eg  a  tw'o-ociavc-wide  filter),  the  lower  the 
frequenc),  the  smaller  the  number  of  frequenc) 
components  needed  to  encode  the  filtered  image 
of  a  constant  object  Combining  these  two 
notions,  Ginsburg  concluded  that  objects  were 
best,  or  most  efficiently,  characterized  b)  the 
lowest  band  of  spatial  frequencies  that  sufficed 
to  discriminate  them  Ginsburg  (1980)  went  on 
to  suggest  that  higher  spatial  frequencies  were 
redundant  for  certain  tasks,  such  as  face  or 
letter  recognition 

Several  investigators  were  quick  to  point  out 
that  objects  can  be  well  discriminated  m  various 
spatial  frequency  bands  Fiorentmi.  Maffci  and 
Sandini  (1*^  )  observed  that  faces  were  well 
recognized  m  cither  high  or  in  low  pass  filtered 
bands  Norman  and  Erlich  (1987)  observed  that 
high  spatial  frequencies  were  csscniia!  for  dis¬ 
crimination  l^lween  to)  tanks  in  photographs 
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With  respect  to  geometric  illusions,  both  Janez 
(1984)  and  Carlson,  Moeller  and  Anderson 
(1984)  obsessed  that  the  geometric  illusions 
could  be  perceis-ed  for  images  that  had  been 
highpass  filtered  so  that  the>-  contained  no 
low  spatial  frequencies.  .This  suggests  that  low 
and  high  .spatial  frequenc)-  bands  maj'  carry 
equivalently  useful  information  for  higher  visual 
processes. 

Characurizms  ike  visual  system 

In  the  studies  cited  above,  the  discussion  of 
spatial  filtering  focuses  on  objeel  spatial  fre¬ 
quencies,  that  is,  frequencies  that  are  defined  in 
terms  of  some  dimension  of  the  object  they 
describe  (cycles  per  object).  Most  psychophysi¬ 
cal  research  with  spatial  frequency  bands  has 
focused  on  retinol  spatial  frequencies,  that  is, 
frequencies  defined  in  terms  of  retinal  coordi¬ 
nates.  For  example,  the  spatial  contrast  sensi¬ 
tivity  function  (Davidson.  1968,  Campbell  & 
Robson.  1968)  describes  the  threshold  sensi¬ 
tivity  of  ihe  visual  system  to  sine  wave  gratings 
as  a  function  of  their  retinal  spatial  frequency 
Visual  system  sensitivity  is  greatest  at  3-10 
cycles  per  degree  of  visual  angle  (c/deg)  How 
does  visual  system  sensitivity  relate  to  object 
spatial  frequencies'’ 

Vneonfaundniy  retinal  and  ahjeet  spatial 
frequencies 

Retinal  spatial  frequency  and  object  spatial 
frequency  can  be  varied  independently  to  deter¬ 
mine  whether  certain  object  frequencies  are  best 
perceived  at  particular  retinal  frequencies  Ob¬ 
ject  frequency  is  manipulated  by  varying  the 
frequency  band  of  bandpass  filtered  images, 
retinal  frequency  is  manipulated  by  varying  the 
viewing  distance 

The  cutoff  ohjed  spatial  frequency  of  low-pass 
filters  and  the  observer's  viewing  distance  were 
varied  independently  by  Legge,  Pelli.  Rubin  and 
Schleske  (1985)  who  studied  reading  rate  of 
filtered  text  at  viewing  distances  over  a  133  I 
range  Over  about  a  6  I  middle  range  of  dis¬ 
tances.  reading  rate  was  perfectly  constant,  and 
It  was  approximately  constant  over  a  30  I 
range  At  the  longest  viewing  distances,  there 
was  a  sharp  performance  decrease  (as  the 
letters  became  indiscriminably  small)  At  the 
shortest  viewing  distance,  performance  de¬ 
creased  slightly,  jrerhaps  due  to  large  eye  move¬ 
ments  that  the  subjects  would  have  to  execute 
to  bring  relevant  material  towards  thc.r  lines  of 


sight,  and  to  the  impossibility  of  peripherally- 
previewing  new  text. 

While  viewing  distance  changed  the  overall 
level  of  performance  in  legge  et  al ,  the  cutoff 
object  frequency  of  their  low-pass  filters  at 
which  performance  asymptoted  did  not  change. 
From  this  study,  we  learn  that  reading  rate  can 
be  quite  independent  of  retinal  frequency  over  a 
fairly  wide  range,  and  that  dependence  on  criti¬ 
cal  object  frequency  does  not  depend  on  viewing 
distance.  Because  the  authors  measured  reading 
rate  only  in  low-pass  filtered  images,  we  cannot 
infer  reading  performance  in  higher  spatial  fre¬ 
quency-  bands  from  their  data. 

Unconfounding  object  slaiisiics  and  i  isual  sj  stem 
properties 

Human  visual  performance  is  the  result  of  the 
combined  effects  of  the  objectively-  available 
information  in  the  stimulus,  and  the  ability  of 
humans  to  utilize  the  information  In  studying 
visual  performance  with  differently  filtered  im¬ 
ages.  it  it  cntical  to  separate  availability  from 
ability  to  utilize.  For  example,  narrow -band 
images  can  be  completely  described  in  terms 
of  a  small  number  of  parameters— Fourier 
coefficients  or  any  other  independent  descrip- 
tors—than  widc-band  images  Poor  human 
performance  with  narrow-band  images  may 
reflect  the  impoverished  image  rather  than 
an  intrinsically  human  characteristic— an  ideal 
observer  would  exhibit  a  similar  loss 

The  problem  of  assessing  the  utility  of  stimu¬ 
lus  information  becomes  acute  in  comparing 
human  performance  in  high  and  in  low  fre¬ 
quency  bandpass  filtered  images  Typically, 
filters  are  constructed  to  have  a  bandwidth 
proportional  to  frequency  (constant  bandwidth 
in  terms  of  octaves)  For  example,  Ginsburg 
(1980)  used  faces  filtered  into  2-octave-widc 
bands,  while  Norman  and  Ehrlich  (1987)  also 
used  2-octavc  bands  for  their  filtered  tank  pic¬ 
tures  With  such  filters,  high  spatial  frequency 
images  contain  more  independent  frequencies 
than  low  frequency  images 

Although  linear  bandwidth  represents  per¬ 
haps  the  important  difference  between  images 
filtered  in  octave  bands  at  different  frequencies, 
the  informational  content  of  the  various  bands 
also  depends  critically  on  the  nature  of  the 
specific  class  of  objects,  such  as  faces  or  lettei 
Obviously,  determining  the  information  content 
of  images  is  a  difficult  problem  When  it  is  not 
solved  the  amount  of  stimulus  information 
available  within  a  frequency  band  is  confounded 


Spatial  frequaacies  asd  diaaiimi&ation  dSaeacy 
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«iih  the  ability  of  human  obsen  cis  io  use  the 
infoimation.  Dii^t  comparisons  of  perform¬ 
ance  betueen  differently,  filtered  objects  are 
inappropriate.  This  distinction  betncen  objcc- 
tisely  available  stimulus  infoimation  and  the 
human  ability  to  use  it  has  not  been  adequately 
posed  in  the  context  of  spatial  bandpass 
filtering. 

Efficiency 

In  the  present  context,  physically  available 
information  is  best  characterized  by  the  per¬ 
formance  of  an  ideal  observer.  If  there  were  no 
noise  in  the  stimulus,  the  ideal  observer  would 
invariably  respond  perfectly.  To  compare  the 
performance  of  an  observer,  human  or  ideal, 
noise  of  root-mean-square  (r.m  s.)  amplitude  n 
is  progressively  added  to  the  signal  of  r.m.s. 
amplitude  s  until  the  performance  is  reduced  to 
some  crilerion.  such  as  50%  correct  in  a  letter 
identification  lash  This  defines  the  signal  to 
noise  ratio,  (j.n), .  for  a  criterion  c.  Efiiciency  eff 
of  human  performance  is  defined  by 


where  h  and  i  indicate  human  and  ideal  observ¬ 
ers.  and  s  and  n  arc  r  m  s  signal  and  noise 
amplitudes  (Tanner  &  Birdsall,  1958)  In  a  pure, 
quantally  limned  system,  efficiency  actually 
represents  the  fraction  of  quanta  absorbed 
(utilization  efficiency).  In  the  context  of  signal 
detection  theory,  efficiency  is  given  by  a  d'  ratio 

efl  =  (d,.d:)' 

Ot  cri  leii 

For  an  object  that  contains  a  broad  spectrum 
of  spatial  frequencies,  object  spatial  frequency  is 
deiermined  by  the  center  frequency  of  a  spatial 
bandpass  filtered  image  Retinal  spatial  fre¬ 
quency  IS  determi  led  by  the  viewing  distance  at 
which  the  stimulus  is  viewed  Stimulus  infor¬ 
mation  IS  determined  jointly  by  the  signai-io- 
noise  ratio,  by  the  spatial  filtering,  and  by  the 
characteristics  of  the  set  of  signals,  these  three 
informational  components  arc  combined  in  the 
efficiency  compulation  Letters  are  a  convenient 
stimulus  to  study  because  they  are  highly  over- 
learned  so  that  human  performance  can  be 
expected  to  be  reasonably  efficient,  and  because 
much  IS  already  known  about  the  visibility  of 
letters  m  the  presence  of  internal  noise  (letter 
acuity)  and  about  the  visual  processing  of 
letters 


Specifically,  to  e  the  roles  of  object 

and  retinal  spa’  -quendes,  letters  are 
filtered  into  vario-  uqueiicy  bands.  Noise  is 
added,  and  the  p  ametric  function  for  cor¬ 
rect  identificatic  v  determined  as  a  function 
ofs/n  Accuracy,-  ends  only  on  s/n  and  not  on 
overall  contrast,  tot  a  wide  range  of  contrasts 
(Pavel,  Sperling.  RiedI  &  Vanderbeck,  1987). 
This  determination  is  repeated  for  eveo’  combi¬ 
nation  of  object  frequency  band  and  viewing 
distance.  Thereby,  retinal  spatial  frequency 
and  object  spatial  frequency  are  unconfounded, 
enabling  us  to  determine  whether  a  particular 
object  frequency  band  is  belter  discriminated 
in  one  visual  channel  (retinal  frequency)  than 
any  other  (Parish  &  Sperling.  1987a.  b)  More¬ 
over,  by  computing  an  ideal  observer  for  the 
identification  task,  we  obtain  an  objective 
measure  of  the  information  that  is  present  in 
each  of  the  frequency  bands  Finally  ,  the  com¬ 
parison  of  human  performance  with  the  per¬ 
formance  of  the  ideal  observer  gives  us  a  precise 
measure  of  the  ability  of  our  subjects  to  utilize 
the  information  in  the  stimulus  Having 
untangled  these  factors,  we  can  determine  which 
spatial  frequencies  most  efliciently  characterize 
letters  for  identification 

METHOD 

Two  experiments  were  conducted  using  simi¬ 
lar  stimuli  and  procedures 

Siiniuh 

Lexters  (signals)  and  noise  The  onginal. 
unfillcrcd  lellers  were  selected  from  a  simple 
5x7  upper-case  font  commonly  used  on  CRT 
terminals  Since  this  is  an  cxpcnmcnl  in  pattern 
recognition,  we  fell  that  the  simplest  Iciier  pat¬ 
tern  might  be  the  mosi  general,  indeed,  this  font 
has  been  widely  used  in  letter  discrimination 
studies  For  the  purpose  of  subsequent  spatial 
filtering,  the  letters  were  redefined  on  a  pixel 
gnd  that  measured  45  (vertical  height)  x  35 
(maximum  horizontal  extent  of  lellers  M  and 
W)  The  letters  had  value  I  (white),  the  back¬ 
ground  had  value  0  (black)  To  avoid  edge 
effecls  in  filtering,  the  background  was  extended 
to  128  X  128  pixels  for  all  computations  How¬ 
ever,  only  the  center  90  x  90  pixels  of  the  stimu¬ 
lus  were  displayed,  as  these  contained  effeciively 
all  the  usable  stimulus  information,  even  for 
low  spatial-frequency  stimuli  Letters  fi  "rcs- 
entation  were  chosen  pseudo-randomi  m 
the  set  of  26  upper-case  English  letters 
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Tabtel.Pafan«cnofiJwbaj>dpa»  fibers  lourraad  upper 
bair-an:phiude  frequeades.  peal,  and  2D  n*^n  frequeocjes 
tn  Qelcs/kticT  hdght 


Band 

Lower 

Peak 

Upper 

Mean* 

0 

0 

Low*pass 

0.53 

039 

1 

026 

0  53 

105 

074 

2 

053 

105 

2.11 

1.49 

3 

105 

2.11 

4  22 

292 

4 

211 

4.22 

844 

577 

5 

6.33 

Highpass 

22.5 

20  25 

•Frequencies  are  weighted  according  to  their  squared  amph- 
tude  (povv-er)  in  computing  the  mean 


fitlds  were  defined  on  a  128  x  128  array  by 
choosing  independent  Gaussian  noise  samples 
for  each  pixel,  with  the  mean  equal  to  zero  and 
a  variance  as  required  by  the  condition.  (As 
with  the  letters,  only  the  central  90  x  90  pixels 
were  displayed  )  Forty  different  noise  fields  were 
created 

Fillers.  Each  stimulus  consisted  of  a  filtered 
letter  added  to  an  identically  filtered  noise  field. 
Six  spatial  fillers  were  available,  corresponding 
to  SIX  successive  levels  of  a  Laplacian  pyramid 
(Burt  &.  Adclson.  1983)  The  zero-frequency 
component  was  added  to  the  images  so  that  they 
could  be  viewed.  The  object-relative  filter 
characteristics,  upper  and  lower  half-amphiude 
cutoff  and  2D  mean  frequency  (cycles  per 
letter  height),  appear  in  Table  1  The  2D  mean 
frequency  /for  a  given  band  is 

ir  I  i:*  ir 

f=i  s /I 

1.0  1-0  I  1-0  ».(F 

where  /, ,  is  Ihe  2D  frequency  and  a, ,  is  ils 
amplitude  Cycles  per  object  height  is  used 
rather  than  Ihe  more  usual  cycles  per  object 
width  because  the  height  of  our  upper-case 
letters  remained  constant  across  the  enure  set. 
whereas  the  width  vaned  between  letters 

The  transfer  functions  (spectra)  of  the  filters 
arc  displayed  in  Fig  1.  Approximately,  fillers 
are  separated  in  spatial  frequency  by  an  octave 
(factor  of  2)  and  have  a  bandwidth  at  half- 
amplitude  of  two  octaves  The  small  mound  in 
the  lower  right  corner  of  Fig  1  is  a  negligible 
imperfection  in  filter  4.  For  convenience,  the 
limited  range  of  spatial  frequencies  passed  by 
each  of  the  fillers  will  be  referred  to  as  the  band 
of  that  filter,  a  specific  band  is  i,  (i  =  0,  1, 2,  3. 
4,  5),  where  is  the  lowest  set  of  frequencies 
and  bj  is  the  highest 

The  filler  spectra  (shown  in  Fig  I)  are 
approximately  symmetrical  in  log  frequency 
coordinates,  a  symmetrical  spectrum  in  log  co¬ 
ordinates  is  highly  slewed  to  the  nghl  in  linear 
frequency  coordinates,  resulting  in  a  mean  that 
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Fig  1.  Filler  characteristics  for  the  filters  used  tn  Ihe 
op^ments  There  are  two  abscissas,  both  on  a  log  scale 
top  absassa  » the  frequenev  in  c)cles  per  un'vmdoued 
field  width  (I2S  pixels),  the  bottom  absassa  is  m  c>xles  per 
letter  haght  (45  pixels)  The  ordinaie  is  the  normalized  gam 
The  parameter  i  indicates  the  filter  designation  6,  in  the  text 


is  much  greater  than  the  mode.  In  a  2D  (vs  ID) 
filter,  the  rightward  shift  is  accentuated  For 
example,  band  2  has  a  peal  frequency  of  1  05 
c/object  but  a  2D  mean  frequency  of  1  49 
c/object  The  single  most  informatisc  character¬ 
ization  of  such  a  slewed  bandpass  spectrum 
depends  somewhat  on  the  context,  usually  use 
the  mean  rather  than  the  peal 
Figure  2  (top)  shows  the  letter  G.  filtered  in 
bands  1-5  without  noise  the  bottom  shows  the 
same  signals  plus  noise.  =  0  5  The  full 
128  X  128  array  (extended  by  reflection  beyond 
its  edges)  was  passed  through  the  filter  so  that 
the  effect  of  the  picture  boundary  did  not 
intrude  into  the  critical  pan  of  the  display 
Signal  10  noise  ratio,  s  'n  A  filtered  letter  is  a 
signal  Let  i,j  index  a  particular  pixel  in  the  x.  j 
coordinate  space  of  the  stimulus  The  signal 
contrast  c,{i.j)  of  pixel  i  j  is 


c.(i.;)  = 


iUi !)-',) 
k 


(I) 


where  1,^  is  the  luminance  of  pixel  /  j  and  /« is 
the  mean  signal  luminance  over  the  90  x  90 
array  Signal  power  per  pixel,  s,  is  defined  as 
mean  contrast  power  averaged  oser  the  90  x  90 
pixel  array 


s^iur'iic.M-  (2) 
'  > 


where  c,,  is  the  contras!  of  pixel  i,  j  and 
/  =  y  =90 

Noise  contrast  c,(i.  j)  is  the  value  of  the  i.jth 
noise  sample  divided  by  the  mean  luminance 
Analogously  to  signal  power  (equation  2).  noise 
contrast  power  per  pixel  n.  is  equal  to  (o  /(,)’ 
The  signal  to  noise  ratio  is  simply  s  n 
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Quantization.  Our  display  system  produced 
256  discrete  luminance  levels.  Level  128  was 
used  as  the  mean  luminance  /«;  4 
47.5  cd/m’.  To  produce  a  visual  display  of  a 
given  letter,  band,  and  s/n,  signal  power  s  and 
noise  power  n  were  normalized  so  that  the 
luminance  of  every  one  of  the  8100  displayed 
pixels  fell  within  the  range  of  the  display  system, 
there  was  no  truncation  of  the  tails  of  the 
Gaussian  noise.  (Although  the  relationship  be¬ 
tween  input  gray-level  and  output  luminance 
was  not  quite  linear,  at  the.  extreme  intensity 
values,  it  was  determined  that  more  than  90% 
of  the  pixels  fell  within  the  linear  intensity 
range.)  Intensity  normalization  was  applied  sep¬ 
arately  to  each  stimulus  (combination  of  signal 
plus  noise)  By  normalizing  the  total  stimulus 
s  4  n,  the  actual  value  of  s  displayed  to  the 
subject  diminished  as  n  increased,  i.e.  the  actual 
value  of  s  was  not  known  by  the  subject.  Indeed, 
even  stimuli  with  precisely  the  same  letter  in  the 
same  band  and  with  the  same  s/n  might  be 
produced  with  slightly  different  s  and  n  depend¬ 
ing  on  the  extreme  values  of  the  noise  fields. 

Seven  values  of  sjn  were  available  for  each 
band,  chosen  in  a  pilot  study  to  insure  that  the 
data  yielded  the  entire  psychometric  function 
(chance  to  best  performance).  The  same  pilot 
study  showed  that  subjects  never  performed 
above  chance  when  confronted  with  noise-free 
letters  from  b„.  this  band  was  omitted  from  the 
present  study 

Procedure  experiment  I 

Four  of  the  experimental  vanables— letter 
identity,  noise  field,  frequency  band,  and  s/n— 
were  randomized  within  each  session  A  fifth 
variable,  viewing  distance,  was  held  constant 
within  each  session  and  was  varied  between 
sessions  Four  viewing  distances  were  used. 
C  121,  0  38.  1  21  and  3  84  m.  A  chin  rest  was 
used  to  stabilize  the  subject's  head  for  viewing 
at  the  shortest  distance  At  the  four  distances, 
the  90  X  90  pixel  stimulus  subtended  31.6,  10. 
3  16  and  I  0  deg  of  visual  angle  respectively.  The 


upper  and  lower  half-amplitude  cut-off  retinal 
frequencies  for  the  upper  six  filters,  with  respect 
to  the  four  viewing  distances  used  in  this  exper¬ 
iment,  and  for  a  fifth  distance  used  in  the  second 
experiment,  appear  in  Table  2.  Subjects  partici¬ 
pated  in  four  I-h'  sessions  at  each  viewing 
distance.  Each  session  consisted  of  315  trials, 
nine  trials  at  each  of  seven  s/n’s  for  each  of  the 
five  frequency  bands. 

Prior  to  the  first  session,  subjects  were  shown 
noise-free  examples  of  the  unfiltered  letters 
They  were  told  that  each  stimulus  presentation 
consisted  of  a  letter  and  a  certain  amount  of 
noise,  and  that  the  letter  may  appear  degraded 
in  some  way.  They  were  informed  that  at  no 
time  would  a  letter  be  shifted  in  onentation  or 
from  its  central  location  in  the  stimulus  field 
Finally,  they  were  instructed  to  view  each  stimu¬ 
lus  for  as  long  as  they  desired  before  making 
their  best  guess  as  to  which  letter  had  been 
presented.  A  response  (letter  identity)  was 
required  on  every  tnal.  Subjects  typed  the 
response  on  a  keyboard  connected  to  the  host 
computer  (Vax  11/750);  subsequently,  typing  a 
carriage  return  erased  the  video  screen  and 
initiated  the  next  trial  in  a  few  seconds  The 
room  illumination  was  very  dim,  the  response 
keyboard  was  lighted  by  stray  light  from  its 
associated  CRT  terminal.  No  feedback  was 
offered  to  the  subjects 

Obseners 

Three  subjects,  two  male  and  one  female, 
between  the  ages  of  20  and  27  pariicipaled  in  the 
experiment.  All  subjects  had  normal  or  cor- 
rected-to-normal  vision  One  of  the  subjects  was 
a  paid  participant  in  the  study 

Procedure  experiment  2 

This  experiment  was  run  before  expt  I  It  is 
reported  here  because  il  offers  additional  data 
with  two  new  and  one  old  subject  at  a  fifth 
viewing  distance.  Except  as  noted,  the  pro¬ 
cedures  are  similar  to  expt  1  The  screen  was 
viewed  through  a  darkened  hood  at  a  distance 


Table  2  Lower  and  upper  haK  power  frequency  and  20  mean  frequency  (me 'deg  of  visual  angle)  for  all  bands  and  viewing 
distances  used  in  both  expenmenls 


V)e^^lng  distance  (m) 

Band 

012 

038 

12) 

3  84 

048 

0  (lovkpass) 

0  00-0  (M  (003) 

000-0  12(009) 

000-0  37(027) 

000-1  18(0  87) 

000-0  15(0  11) 

i 

002-007(005) 

006-0  23(0  16) 

0  18-0  74(0  52) 

0  58-2  34(1  65) 

0  07-0  29  (0  21) 

2 

0(^-015(010) 

0  12-047(033) 

037-1  48(104) 

1  18-4  70(3  30) 

0  1  5-0  59  (041) 

3 

0  07-0  30(0  20) 

0  23-094(064) 

0  74-2  97  (2  04) 

2  34  -9  40(6  4  8) 

029-1  18(081) 

4 

015-0  59(0  40) 

047-1  88{J  27) 

148-5  94(4  04) 

4  70-18  80(12  82) 

059-2  36(1  60) 

5  (highpass) 

0  30  2  25(1  41) 

0  94  7  (3(4  4<) 

2  97-22  53(14  19) 

940-7127(45  00) 

1  77-8  96(5  63) 
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of  0.48  m.  At  this  distance,  the  90  x  90  stimuli 
subtended  7.15  deg  of  visual  angle.  The  half- 
amplitude  cut-off  frequencies  and  the  mean 
frequencies  of  the  six  spatial  filters  are  given  in 
the  rightmost  column  of  Table  2.  Three  male 
subjects  between  the  ages  of  20  and  27  par¬ 
ticipated  in  the  experiment.  All  subjects  had 
normal  or  corrected-to-nonhal  vision.  Two  of 
the  subjects  were  paid  for  their  participation, 
and  one.  DHP,  also  participated  in  expt  1.  Five 
sessions  of  315  tnals  were  run  for  each  subject. 

RESULTS 

Psychometric  functions:  p  w  log/f,  sfn 
The  measure  of  performance  is  the  observed 
probability  p  of  a  correct  letter  identification. 


The  complete  psychometric  functions  arc  dis¬ 
played  in  Figs  3  (expt  1)  and  4  (expt  2).  A 
separate  psychometric  function  is  shown  for 
each  subject,  viewing  distance  and  frequency 
band.  In  band  ^i,  for  all  subjects,  performance 
asymptotes  (for  noiseless  stimuli)  at  p  «  0.5.  In 
all  other  bands,  performance  improves  from 
near-chance  (1/26)  to  near  perfect  as  the  value 
of  sjn  increases. 

Noise  resistance  as  a  function  of  frequency  band 

An  obvious  aspect  of  the  data  of  both  exper¬ 
iments  is  that  the  data  move  to  the  left  of  the 
figure  panels  as  band  spatial  frequency  in¬ 
creases.  This  means  that  high  spatial  frequency 
stimuli  (bands  />4,  b^)  arc  identifiable  at  smaller 


Fig  3  Psjchomcu.c  functions  fiom  «xpi  1  Each  graph  di$pla>s  performance  »$  a  function  of  log,- 5  n 
>Mihin  a  frequenc)  band  The  parameter  t$  xio^mg  distance  Subjects  arc  arranged  in  columns  and 
frequcnc)  band  is  arranged  m  ro»s  progressing  from  the  highest  frcquenc)  band  at  the  lop  to  the  Io-acsi 
band  at  the  bottom  The  four  viewing  distances  are  3  84  (O)  1  21  (^)  0  38  (O)  and  0  121  (0) 
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Fig  4  Ps)chometfic  functions  for  each  subject  and  fre 
quencs  band  tn  espt  2  Vieutng  distance  seas  048m  The 
fisc  frcqucnc)  bands.  b,-bi,  are  indicated,  respeclivelj.  b) 
0  Q.  A.  0  and  +  The  probabititj  of  a  cortect  sesponse 
ts  plotted  at  a  function  of  logi^  S''rt 

sin  than  stimuli  in  bands  b|  and  b.,  resistance  to 
noise  increases  with  spatial  frequency  band.  To 
enable  comparisons  of  noise  sensitivity  as  a 
function  of  band,  the  s/«  at  which  p  =  50%  was 
estimated  for  each  subject  and  frequency  band 
from  expt  1  by  means  of  inverse  interpolation 
from  the  best  fitting  logistic  function  As  view¬ 
ing  distance  had  no  effect,  all  estimates  were 
made  using  the  data  collected  when  viewing 
distance  was  equal  to  0  38  m.  A  graph  of  these 
points  as  a  function  of  the  mean  object 
frequency  of  the  band  is  plotted  in  Fig  5  (O) 
For  comparison,  the  expected  rate  of  improve¬ 
ment  in  (s/ii)jo5,,  based  on  the  increasing  num¬ 
ber  of  frequency  components  as  one  moves  from 
low  to  high  frequency  bands,  is  plotted  as  a 
series  of  parallel  lines  in  Fig  5  Performance 
improves  ((J,'n)s,,,  decreases)  somewhat  faster 
than  l//(the  slope  of  the  parallel  lines)  These 
results,  and  Fig  5,  will  be  analyzed  in  detail  in 
the  Discussion  section 


20  Muon  (ruqmncjf  (eycUs/Utter  peiqM) 

Fig  5  Performance  of  human  lubjeclsind  vinous  compu- 
lational  disenminalois  The  abscissa  indicates  log,,  of  the 
mean  frequency  of  each  bandpass  slimulus  The  ordinate 
indsates  ihe  (inlerpolalcd)  s/n  ratio  at  nhich  a  probabilily 
of  a  correct  response  p  -  0  5  is  achieved.  Circles  indicate 
each  of  the  three  subjects  in  expt  I  at  Ihe  intcnncdiale 
viewng  distance  of  1,21  m  In  band  i,.  2  of  3  human 
subjects  fail  to  achieve  50%  correct  (tff  -  0),  these  points  lie 
outside  the  graph  (A)  indicates  sub-ideal  and  (0)  indicates 
super-ideal  performances  of  discnminators  that  brackets  the 
ideal  discnminator  The  shaded  area  below  the  super.ideat 
discnminator  indicates  theoretically  unachievable  perform¬ 
ance  Squares  indicate  perfotmance  of  a  spatial  ev.-r-'ator- 
discnmmator  The  oblique  parallel  lines  have  slope  - 1  that 
represents  the  improvement  in  evpected  performance 
(decrease  tn  sin)  as  function  of  the  number  of  frequency 
components  in  each  band  when  fillet  bandwidth  is 
proportional  to  frequency 

The  non-effeet  of  new  mg  distance 

Another  property  of  the  data  is  that,  in  most 
conditions,  viewing  distanee  has  no  effect  on 
performance  Analysis  of  variance,  carried  out 
individ  ..ly  for  each  subject,  shows  that  there  is 
no  significant  effect  of  distance  in  any  band  for 
subject  dhp  and  a  significant  effect  of  distance  in 
bands  h,  and  hj  for  the  other  two  subjects 
Further  analysis  by  a  Tukey  lest  (Winer,  1971) 
in  bands  b,  and  i,  for  these  subjects  shows  that 
the  only  significant  effect  of  distance  is  th  t 
visibility  at  the  longest  viewing  distance  is  better 
than  at  the  other  three  distances  For  subject 
CJD,  Ihe  improvemcpt  is  equivalent  to  a  gam  in 
sin  of  0  19  and  0  28  log,,  (for  bands  b,  and  i,. 
respectively),  for  MAV,  the  corresponding  gams 
were  0.21  and  0.40 

Improved  performance  at  long  viewing  dis¬ 
tances  is  almost  certainly  due  to  the  square 
configuration  of  individual  pixels,  which  pro¬ 
duces  a  high  frequency  spatial  pixel  noise  that  is 
attenuated  by  viewing  from  sufficiently  far  away 
(Harmon  &  Julesz,  1973)  In  low  frequency 
bands,  pixel-boundary  noise  is  not  a  problem 
because  the  spatial  filtering  insures  that  adjacent 
pixels  vary  only  slightly  in  intensity  We  ex¬ 
plored  the  hypothesis  of  pixel-boundary  noise 
with  subject  CJD,  who  showed  a  distance  effect 
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in  band  5.  At  an  imenriediatc  viewing  distance 
of  1.21  m,  CJD  squinted  her  eyes  while  viewing- 
stimuli  from  band  5  By  blurring  the  retinal 
image  of  the  display  in  this  way,  performance 
improved  approximately  to  the  level  of  the 
furthest  viewing  distance. 

To  summarize,  the  only  significan.  .:Tect  of 
distance  that  we  observed  was  a  lowering  of 
performance  at  near  viewing  distances  relative 
to  the  furthest  distance.  This  impairment 
occurred  primarily  in  bands  4  and  5.  In  these 
bands,  the  spatial  quantization  of  the  display 
(90  X  90  square-shaped  pixels)  produces  arti- 
factual  high  spatial  frequencies  that  mask 
the  target  These  artifactually  produced  spatial 
frequencies  can  be  attenuated  by  deliberate 
blurring  (squinting),  or  by  producing  displays 
with  higher  spatial  resolution,  or  by  increasing 
the  viewing  distance  to  the  point  where  the  pixel 
boundaries  arc  attenuated  by  the  optics  of  the 
eye  and  neural  components  of  the  visual  modu¬ 
lation  transfer  function.  In  all  cases,  blurring 
improves  performance  and  eliminates  the 
slightly  deleterious  effect  of  a  too  small  \icwing 
distance  Thus,  for  correctly  constructed  stim¬ 
uli.  in  the  frequency  ranges  studied,  there  would 
be  no  significant  effect  of  viewing  distance  on 
performance  This  finding  is  in  agreement  with 
the  results  of  Leggc  et  al  (1985),  who  examined 
reading  rate  rather  than  letter  recognition  It  is 
in  stark  disagreement  with  the  results  of 
sinewaxe  detection  experiments  in  which  retinal 
frequency  is  critical— see  Sperling  (1989)  for  an 
explanation 

DISCUSSION 

A  comparison  of  performance  in  different 
frequency  bands  shows  that  subjects  perform 
better  the  higher  the  frequency  band,  and  sub¬ 
jects  require  the  smallest  signal-to-noise  ratio 
in  the  highest  frequency  band  To  determine 
whether  performance  in  high  frequeniy  bands  is 
good  because  humans  arc  mote  efficient  in 
utilizing  high-frequency  information,  or  because 
there  is  objectively  more  information  in  the 
high-frequency  images,  or  both,  requires  an 
investigation  of  the  performance  of  an  ideal 
observer.  The  performance  of  the  ideal  observer 
is  the  measure  of  the  objective  presence  of 
information  Human  Performance  results  from 
the  joint  effect  of  ti.e  objective  presence  of 
information  and  the  ability  of  humans  to  utilize 
that  information  Human  efficiency  is  the  ratio 
of  human  performance  to  ideal  performance 


Ideal  dtsermimtor 

Definition.  An  ideal  discriminator  makes  the 
best  possible  decision  given  the  available  data 
and  the  interpretation  of  “best.”  The  perform¬ 
ance  of  the  ideal  discriminator  defines  the  objec¬ 
tive  utility  of  the  information  in  the  stimulus. 
We  prefer  the  name  ideal  dismmmator,  rather 
than  ideal  observer,  because  it  indicates  the 
critical  aspect  of  performance  under  consider¬ 
ation,  but  we  occasionally  use  ideal  obsener  to 
emphasize  the  relations  to  a  large,  relevant 
literature  on  this  subject.  Our  purposes  in  this 
section  are  first,  to  derive  an  ideal  discriminator 
for  the  letter  identification  task,  second,  to 
develop  a  practical  working  approximation  to 
this  discriminator,  and  third,  to  compare  the 
performance  of  the  human  with  the  ideal  dis¬ 
criminator. 

Although  ideal  observers  have  recently  come 
into  greater  use  in  vision  research,  the  appli¬ 
cations  have  focused  primarily  on  determining 
the  limiis  of  performance  for  relatively  low-level 
visual  phenomena  For  example,  Barlow  (1978. 
1980).  and  Barlow  and  Reeves  (1979)  investi¬ 
gated  the  perception  of  density  and  of  mirror 
symmetry.  Geisler  (1984)  investigated  the  limits 
of  acutty  and  hyperacuity,  Legge,  Kersten  and 
Burgess  (1987)  examined  the  pedestal  effect. 
Kersten  (1984)  studied  the  detection  of  noise 
patterns;  and  Pelli  (1981)  detailed  the  roles  of 
internal  visual  noise  Geisler  (1989)  provides  an 
overview  of  efficiency  computations  in  early 
vision  Our  application  differs  from  these  in  that 
we  expand  the  techniques  and  apply  them  to 
a  higher  perceptual/cognitive  function,  letter 
recognition 

For  the  letter  identification  task,  the  ideal 
discriminator  is  conceptually  easy  to  define  A 
particular  observed  stimulus,  x.  representing  an 
unknown  letter  plus  noise,  consists  of  an  inten¬ 
sity  value  (one  of  256  possible  values)  at  each  of 
90  X  90  locations  The  discriminator's  task  is  to 
make  the  correct  choice  as  frequently  as  possible 
from  among  the  26  alternative  letters 

The  likelihood  of  observ  ing  stimulus  x,  given 
each  of  the  26  possible  signal  alternatives,  can 
be  computed  when  the  probability  density  func¬ 
tion  of  the  added  noise  is  known  exactly  The 
optimal  decision  chooses  the  letter  that  has  the 
highest  likelihood  of  yielding  x.  The  expected 
performance  of  the  ideal  discriminator  is  com¬ 
puted  by  summing  its  probability  of  a  correct 
response  over  the  256*'“  possible  stimuli  (256 
gray  levels,  90  x  90  pixels)  Unfortunately, 
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Fig  6  Flow  chan  of  ihe  cxpcnmcnlal  paoccdures  that  ate  modcJIcd  by  the  ideal  ditcnminatoi  anahsts 
Upper  half  indicate!  space  domain  operations,  tower  half  indicates  the  corresponding  opetaiions  in  the 
frequency  domain  Computations  are  carried  out  on  I2R  >  128  arrays,  the  subject  secs  only  the  center 
90  a  90  pixels  A  random  letter  and  a  random  noise  field  are  each  filtered  by  the  same  filter  (h).  the  noise 
IS  amplified  to  prostde  Ihe  desired  signabto  noise  ratio,  the  letter  and  noise  are  added,  the  output  is  scaled 
and  quantired  (represented  by  the  addition  of  digitization  noise),  and  the  result  is  shown  to  the  subject 
In  the  frequency  domain  ci,.  *u,.  the  bandpass  filter  selects  an  annulus,  whereas  the  quantization  noise 
IS  uniform  oxer  to,,  so, 


when  there  is  both  bandpass  filtered  and  inten¬ 
sity  quantization,  the  usual  simplifications  that 
make  this  enormous  computation  tractable  are 
not  applicable 

As  an  alternative  to  computing  the  expected 
performance  of  the  tdeal  discriminator,  one  can 
compute  Its  performance  with  a  particular  sub¬ 
set  of  the  possible  stimuli— the  stimuli  that  the 
subject  actually  viewed  or,  preferably,  a  larger 
set  of  stimuli  for  more  reliable  estimation  This 
Monte  Carlo  simulation  of  the  performance 
of  the  ideal  discriminator  is  a  tractable  com¬ 
putation  that  yields  an  estimate  of  expected 
performance 

Dernalton  Stimulus  construction  is  dia¬ 
grammed  in  Fig  6  which  shows  the  equivalent 
operations  in  the  space  and  the  frequency  do¬ 
mains  To  derive  an  ideal  discriminator,  we  need 
to  carefully  review  the  processes  of  stimulus 
construction  We  use  uppercase  letters  to  rep¬ 
resent  quantities  in  the  frequency  domain  and 
lowercase  letters  to  represent  quantities  in  the 
space  domain  A  letter  is  defined  by  a  90  x  90 
array  that  lakes  the  value  I  at  the  letter 
locations  and  0  at  the  background  locations 
When  this  array  is  spatially  filtered  in  band  b,  it 
defines  the  letter  template  r,  y(x.j).  where  r 


indicates  the  particular  letter,  h  the  frequency 
band,  and  x,y  the  pixel  location  We  write 
T,  g(<o„(u.)  for  the  Fourier  series  coefficient  of 
r,  f  indexed  by  frequency 
An  unknown  stimulus  u,  ti't.y)  to  be  viewed 
by  a  subject  is  produced  by  adding  filtered 
riiix.y)  with  post-filtering  vanance  to  Ihe 
template  t,t(x,y),  where  letter  identity  i  is  un¬ 
known  to  the  subject  The  stimulus  is  scaled  and 
digitized  (quantized)  to  256  levels  prior  to  pres¬ 
entation,  contributing  an  additional  source  of 
noise  q,  ,(x,y).  called  digitization  noise  Finally, 
a  d.c  component  (dc)  is  added  to  u,  *  to  bring 
the  mean  luminance  level  to  128  These  steps  are 
diagrammed  in  Fig.  6  which  shows  both  Ihe 
space-domain  and  the  corresponding  frequency- 
domain  operations  The  space-domain  compu¬ 
tation  IS  encapsulated  in  equations  (3) 

«.  *(  X.  y)  =  A  »I'.  >(  X.  y)  +  «i(-x.  y))  (3a) 

«.  t(x.y)  =  ^xilAsfx.  y)  +  "»(x.  y)l 

+  ?i»(x,))  +  *  (3b) 

The  scaling  constant  ft ,,  limits  the  range  of 
real  values  for  each  pixel,  prior  to  quantization, 
to  (-0  5,255.5)  The  degree  of  scaling  is  dete,-- 
mined  by  the  maximum  and  minimum  values  in 
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the  function  t,i  +  nt.  Note  that  the  extreme 
values  in  the  image  are  determined  by  which 
1$  adjusted  to  yield  the  appropnate  x/n  for  each 
condition,  the  values  of  i,  t  are  hxed  prior  to 
scaling  Specifically 

256 

max(r,j  +  «j)-min(rt,+n,)' 

As  a  result  of  bandpass  filtering,  the 
noise  samples  in  adjacent  pixels  are  strongly 
dependent  on  each  other.  Therclorc,  the  dis- 
cnminator  problem  is  best  approached  in  the 
Fourier  domain,  where  the  r-mdom  vanables 
{Aij(a)„(a,)}  are  jointly  independent  because 
the  filtering  operations  simply  scale  the  differ¬ 
ent  frequency  components  without  intro¬ 
ducing  any  correlations  (van  Tress,  1968).  The 
task  of  the  ideal  discriminator  is  to  pick  the 
template  i, » that  maximizes  the  likelihood  of  u, » 
with  a  priori  knowledge  of.  (i)  the  fixed  func¬ 
tions  I,,,  and  their  probabilities,  and  (ii)  the 
densities  of  the  jointly  independent  random 
variables  {AV(<o,,<o,)}  As  is  clear.  p,i,  a{. 

{A,  are  all  jointly 

distributed  random  variables  characterized  by 
some  density  f  To  compute  the  Itkelihood  of  ii, , 
the  ideal  discriminator  must  integrate  f  over  all 
possible  values  that  may  be  assumed  by  the 
set  of  jointly  distributed  random  variables, 
whose  values  are  constrained  only  in  that  they 
result  in  a  possible  stimulus  u, ,.  Unfortunately, 
no  closed-form  solution  to  this  problem  is  avail¬ 
able,  forcing  us  to  look  for  an  alternative 
approach. 

Brackeiiiig  To  estimate  the  performance  of 
the  ideal  discriminator,  we  look  for  a  tractable 
super-ideal  discriminator  that  is  betler  than  the 
ideal  but  which  is  solvable  Similarly,  we  look 
for  a  tractable  sub-ideal  discriminator  that  is 
worse  than  the  ideal  The  ideal  discriminator 
must  lie  between  these  two  discriminators,  that 
IS,  we  bracket  its  performance  between  that  of 
a  "super-ideal’'  and  a  "sub-ideal”  discnminator 
The  more  similar  the  performance  of  the  super- 
and  sub-ideal  disenminators,  the  more  con¬ 
strained  IS  the  ideal  performance  which  lies 
between  them 

Our  super-ideal  discriminator  is  told,  a  prion, 
the  extact  values  for  A ,  and  aj  for  each  stimu¬ 
lus  presentation  Therefore,  it  is  expected  to 
perform  slightly  better  than  the  ideal  discrimi¬ 
nator  which  must  estimate  these  values  from 
the  data  The  sub-ideal  discnminator  estimates 
these  same  parameters  from  the  presented 
stimulus  in  a  simple  but  nonideal  way  There¬ 


fore,  It  is  expected  to  perform  slightly  worse 
than  the  ideal  discriminator.  The  computational 
forms  used  to  compute  hr  and  for  the 
sub-ideal  discriminator  are  presented  in  the 
Appendix,  along  with  the  derivation  of  the 
likelihood  estimator  used  by  both  discrimin¬ 
ators.  A  complete  discussion  of  these  deri¬ 
vations  and  the  problems  associated  with  the 
formulation  of  an  ideal  discriminator  for  such 
complex  stimuli  is  presented  in  Chubb,  Sperling 
and  Parish  (1987). 

Performance  of  the  bracketed  discriminator 
The  super-  and  sub-ideal  discriminators  were 
tested  in  a  Monte  Carlo  senes  of  trials,  in  which 
they  each  were  confronted  with  90  stimuli  in 
each  of  the  frequency  bands  at  each  of  seven  s/n 
values  chosen  to  best  estimate  their  50%  per¬ 
formance  point  The  s/n  necessary  for  50% 
correct  discriminations  was  estimated  by  an 
inverse  interpolation  of  the  best  fitting  logistic 
function  The  derived  (s/n);;,..  is  the  measure 
of  performance  of  a  discriminator  The  mean 
ratio,  across  frequency  bands,  of 

(s/n)«,.  sub-ideal/(.r /«)«.,  super-ideal 

is  about  2  (approx  0  3  logu  units)  The 
ratio  does  not  depend  on  the  criterion  of 
performance 

Efficiencf  of  human  discrtmtnaiion 

In  all  conditions,  human  subjects  perform 
worse  than  the  sub-ideal  discriminator  Notably, 
with  no  added  luminance  noise,  the  subideal 
(and.  of  course,  the  ideal)  discriminator  func¬ 
tion  perfectly,  even  in  fco  where  subject  perform¬ 
ance  IS  at  chance,  and  in  b,  where  subjects 
reached  asymptote  at  about  5030  correct 

Data  from  the  subjects  are  plotleJ  with  the 
Uln)<t,’,,  sub-ideal  and  (s/n)so-.  super-ideal  in 
Fig.  5  For  comparison.  Fig  5  also  shows  the 
performance  of  a  correlator  discriminator  which 
chooses  the  letter  template  that  correlates  most 
highly  with  the  stimulus  in  the  space  domain  In 
the  coordinates  of  Fig  5  (logios/n  vs  logm/ 
where  /  represents  the  mean  2D  spatial  fre¬ 
quency  of  the  band),  the  vertical  distance  d  from 
the  human  data  logls/n)*,..,  human  down  to  the 
bracketed  discriminator  log(s/n)jo.,,  ideal  rep¬ 
resents  the  logio  of  the  factor  by  which  the 
bracketed  discriminator  outperforms  the  human 
observer  at  that  value  of  /  For  the  purpose 
of  sjiecifying  efficiency,  we  assume  the  ideal 
discriminator  lies  at  the  mid-point  of  the  sub 
and  super-ideal  discriminators  in  Fig  5  The 


Fi$  7.  DauisssA’SGz  cSofscy  as  a  c€  Us  sea 

freqt»c>  of  a  S-ocmt  fcaai  0a  c;^  per  less;  bssfS:) 
iLdicatrd  oa  a  lopththsae  scale.  OaU  are  s&csa  fer  f^rrr 
o&ssrters:  ^-SAW.  0-RS.  O-DH?.  TSc 
dsttase  is  Z21  si.  *ioA  is  ttfnsssu^z  ef  aS  *ie*s^ 
feascrs  tested. 

eSideno'  Qf  of  human  dtscnminatioo  re!aiit% 
to  thf  bracketed  dtscriminaior  is  10“^. 
uherc, 

d  =  Io£(sn)x..  ^  -  logif  n)y.^^,. 

The  raluts  of  rjf  in  each  object  friqueno 
band  are  shouii  in  Fi;.  7.  In  band  0.  tff  ii  zero 
because  human  performance  ne'er  reaches 
50'e.  indeed,  it  ne'er  rises  significant!'  abo'e 
4%  (chance)  In  band  human  performance 
as'mpioticall)  ciimbs  close  to  50'.»  as  s  n  ap¬ 
proaches  infinity,  eyfa  0  In  band  2.  tff  reaches 
Its  maximum  of  35-4'’%  (depending  on  the 
subjeci)  and  it  declines  rapidly  unh  increasing 
frequency 

The  42%  a'erage  efficiency  in  band  2  is 
similar  in  magnitude  to  the  highest  effiaenaes 
obserxed  in  comparable  studies  For  example, 
effiaency  has  been  determined  for  deiecling 
various  kinds  of  patterns  in  arrays  of  random 
dots  (Batlou.  1978,  1980.  'an  Meeieren  & 
Barlow.  1981).  tasks  which,  like  ours,  may 
require  significantly  cognitue  processing  In  a 
w  ide  range  of  conditions,  the  highest  efficienacs 
observed  were  about  50%.  and  frequently 
lower  Van  Mccteren  and  Barlow  (1981)  also 
found  that  efficiency  was  perfectly  correlated 
with  object  spatial  frequency  and  was  indepen¬ 
dent  of  retinal  spatial  frequency 

Spatial  correlator  discrtminator  A  correlator 
disenminator  cross-correlates  the  presented 
sttmulus  with  Its  memory  templates  and  chooses 
the  template  with  the  highest  correlation  Corre¬ 
lation  can  be  earned  out  in  the  space  or  in  the 
frequency  domain  Correlation  is  an  efficient 
strategy  when  noise  in  adjacent  pixels  is  inde¬ 
pendent  and  when  members  of  the  set  of  signals 
have  the  same  energy,  both  of  these  conditions 


arc  tioizred  by  ocs-  sdmsS.  Hoxmee,  «bea 
scScaes]  prior  iafonsarion  ts  zraSabSe  (o  ssiy- 
jects.  ibey  <Jo  appear  to  £=?%  a  cross-erwre- 
btioa  saategy  (^gess,  1935). 

!t  it  tsiercstiaz  to  do::  that  tb:  perfon-argr 
of  to:  spiual  corrriator  <Sseri=asawr  over  ib: 
middle  taog:  of  spatial  freqoeacitrgqrsttdos: 
to  tb:  performance  of  tb:  strii-ideal  £scsisaa- 
alor.  At  high  spatial  fteqtmries.  cxirrdator 
performaoce  degssmats.  ds:  to  hs  iaaiaSty  to 
focus  spa&Sv  on  ibos:  pod  locatioss  that 
contain  tb:  most  informatica.  A  spatial  corre¬ 
lator  that  opumally  umghted  spatial  locations, 
could  O'-ncom:  lb:  spatial  focusisg  problem  at 
high  frequencies.  (Spatial  foensieg  is  treats]  in 
lb:  next  ssnion.) 

At  an  frequencies,  the  spatial  corrdator  is 
nonideal  because  nmse  at  spatial  adjacsii  {«ds 
is  not  independent.  At  low  spatial  frequencies, 
the  nonindependence  of  adjacent  locations  be¬ 
comes  extreme  and  the  co.-rtlator  fails  miser¬ 
ably.  This  points  out  that,  for  our  stimuli, 
correlation  detection  is  better  carried  out  in  the 
frequency  domain  because  there  the  noise  at 
difTcrent  frequendes  is  independent  The  quah- 
tati'r  similarity  between  the  correbtor  dis¬ 
criminator  and  the  subjects'  data  suggests  that 
the  subjects  might  be  employing  a  spatial 
correblion  strategy,  augmented  by  location 
weighting  at  high  frequencies 

Lokcsi  spatial  frequertries  ni^rien;  for  letter 
discrimmctioi.  Band  2  corresponds  to  a  2- 
octave  band  with  a  peak  frequency  of  1.05 
c  object  ('criical  height  of  letters)  and  a  2D 
mean  frequency  of  1.49  co'ojeci  At  the  four 
viewing  distances.  1.05  c  object  corresponds  to 
retinal  frequencies  of  0.074.  0  234.  0  739  and 
2  34  c  deg  of  visual  angle  We  obsene  perfect 
scale  invariance  all  of  iDcse  retinal  frequencies, 
and  hence  the  visual  channels  that  process  this 
information,  arc  equally  effective  in  achieving 
the  high  efficiency  of  discrimination 

The  finding  that  ft.  with  a  center  frequency  of 
I  05  ci'object  and  a  i  amplitude  cutoff  at  2.1 
ciobject  IS  cntical  for  letter  discnmination  is  in 
good  agreement  with  previous  findings  of  both 
Ginsburg  (1978)  for  letter  recognition  and 
(.eggccial  (1985)  for  reading  rate  Leggeetal 
used  low-pass  filtered  stimuli,  which  included 
not  only  spatial  frequencies  within  an  octave  of 
I  c  object  ihi)  but  also  included  all  lower  fre¬ 
quencies  From  the  present  study,  we  expect 
human  performance  with  low-pass  and  with 
band-pass  spatial  filienng  to  be  quite  similar  up 
to  1  c  object  because  the  lowest  frequency 
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bzads.  nba  jsaaxa  in  isoSziion,  zze  ptiap- 
tczSv  csdss  (zt  ten  idsn  prssnBd  zlon:). 

Ii  is  zn  isyonza  ihzl  osr  ss!g:as 
2ctczB}pofo:a^bctur.nttesciS£orzds£v- 
ing  critnion  psfonszscc  zt  z  kmv  s/e  nus,  zl 
fcijbzr  frcjcsry  bzsds  tbzn  7c^  is  ex- 
pbiscd  to  lb:  iasszss  is  siisssiss  isfonsztios 
in  Iszbs^  frspsiQ-  siisinE.  Iixnzsa]  infor- 
c^Iios  moR:  tbzn  cnspsuzis  for  tb:  ssl^Kts' 
kiss  in  eSdnscy  zs  spztizl  fitqixacy  incrszsa. 

CoKpoarJs  of  disciiirjnstioa  perfonKme 

Tbonjb  lb:  perfonszoc:  of  ib:  tezcldo] 
istel  discrimisztor  is  csdbl  in  quznlifjiag  lb: 
isfonnziioszl  utility  of  lb:  various  bands,  h  is 
instructisa  to  conndcr  lb:  zhzsging  pbyncal 
structure  of  tb:  stimuli  zs  net’  ^\'hat  com- 
ponfflts  of  lb:  stimuli  zauzlly  ted  to  a  gain  in 
information  with  increasing  frequency?  Accord¬ 
ing  to  Shannon's  theorem  (Shannon  &.  Weaver. 
1949).  an  absolutely  bzndlimiial  l-D  signal  can 
be  represmted  by  a  number  of  samples  m  that 
is  proportional  to  its  bandwidth.  When  the 
signal-to-noise  ratio  in  each  sample  s,  n,  is  the 
same,  the  oserzil  signal-to-noise  ratio  s/n  grows 
as  ytm.  In  the  space  domain,  our  filters  were 
constructed  (approximately)  to  d.ffer  only  in 
scale  but  not  in  the  shape  of  thnr  impulse 
responses  Therefore,  when  the  mean  frequency 
of  a  filter  band  increased  by  a  factor  of  2.  the 
bandwidth  also  increased  by  2.  Since  the  stimuli 
are  2D.  the  effectne  number  of  samples  in- 
cieas»  with  the  square  of  frequency,  and  the 
increase  in  cffectiic  $  n  ratio  is  proportional  to 
m  This  expected  improxement  with  frequency, 
based  simply  on  the  increase  in  cffcclise  number 
of  samples,  is  indicated  by  the  oblique  parallel 
lines  of  Fig  5  with  slope  of  - 1  The  expected 
i-nprosemeni  in  threshold  s  n  due  simply  to  the 
linearly  increasing  bandwidth  of  the  bands  does 
a  reasonable  job  of  accounting  for  the  improxe- 
menl  in  performance  for  both  human  and 
bracketed  discriminators  between  b,  and  fi,. 

Performance  of  all  discriminators  improses 
faster  with  frequency  between  0  39  and  1.5 
Ciobject  and  between  5  8  and  22  c;objecl  than  is 
predicted  from  the  bandwidlhs  of  the  images  A 
slope  ste. .  er  than  - 1  means  that  there  is  more 
information  for  discnminaling  letters  in  higher 
frequency  bands  esen  when  the  number  of 
independent  samples  is  kept  the  same  in  each 
band  Once  sampling  density  is  controlled,  just 
how  much  information  letters  happen  to  con¬ 
tain  in  each  frequency  band  is  an  ecological 
property  of  upper-case  letters 


bacasng  spsiisl  hcaSzaiioa  xitk  mtajing 
frtq-jtrcj  fcssi  From  tb:  bsmzn  observer's 
point  of  vter.  the  tetersnformztion  in  low-pzss 
nitma)  imzgn  is  spread  out  oxxr  a  large  portion 
of  lb:  total  enzg:  zrrzy.  In  Isgh  spztizl-frc- 
qcesqi  cazges.  th:  fetter  infonnztion  is  concmi- 
trzted  in  a  smzD  proporticn  of  the  total  number 
of  pixds.  In  high  spziiahfrequeiuiy  imzgK.  z 
human  observer  who  knows  wind  pixels  to 
atteiul  win  experiericc  an  cfiecthe  x/n  that  is 
higher  than  zn  observer  who  zttnds  equally  to 
zH  {Mxels.  In  this  respect,  bumzns  differ  from  an 
ideal  discriminztor.  The  ideal  discriminator  has 
unlimited  memory  and  procasing  resources, 
does  not  explicitly  incorporate  zny  selective 
mmhanism  into  its  decision,  and  uses  the  same 
algorithm  in  all  frequaicy  bands.  Information 
from  irrcfexznt  j^cls  is  cnmeshal  in  the 
computation  but  cancels  out  perfectly  in  the 
letter-decision  process.  To  understand  human 
performance,  howexer.  it  is  useful  to  examine 
how.  with  our  size-scaled  sxatial  fillers,  letter 
information  comes  to  be  oaupy  a  smaller  and 
smaller  fraction  of  the  imaje  array  as  spatial 
frequency  increases. 

Here  we  consider  three  fcrmulations  of  the 
change  in  the  internal  structure  of  the  images 
with  increasing  spatial  frequency:  (1)  spatial 
localization,  (2)  correlation  between  signals,  and 
(5)  nearest  neighbor  analysis  We  have  already 
noted  that,  in  out  images,  the  informaiion-nch 
pixels  become  a  smaller  fraction  of  the  total 
pixels  as  frequency  band  increases  Indeed,  this 
reduction  can  be  estimated  by  computing  the 
information  Iransmiltcd  at  any  particular  pixel 
location  or,  more  appropnatcly  for  estimating 
noise  resistance,  by  computing  the  vanance  of 
intensity  (at  that  pixel  locatior )  over  the  set  of 
26  alternative  signals 

To  demonstrate  the  degree  of  increasing 
localization  with  increasing  frequency,  the  van¬ 
ance  (over  the  set  of  26  letter  templates)  was 
computed  at  each  pixel  location  (x.j)  Total 
poKer.  the  total  vanance.  is  obtained  by  sum¬ 
ming  over  pixel  locations  The  number  of  pixel 
locations  needed  to  aehievc  a  specific  fraction  of 
the  total  power  is  given  in  Fig  8,  w.th  frequency 
band  as  a  parameter  These  curves  describe  the 
spatial  distnbulion  of  information  in  the  latter 
templates  If  all  pixels  were  equally  informative, 
exactly  half  of  the  total  number  of  pixels  would 
be  needed  to  account  for  50' o  of  the  total 
power  The  solid  eurves  in  Fig  8  show  that  the 
number  of  pixels  needed  to  convey  any  percent¬ 
age  of  total  signal  power,  decreases  as  the 
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Tip  S.  FncUos  oS  total  pG*tt  costasod  xa  the  n  taost 
ezuesC'VaJoed  pods  as  a  h»ct>oa  of  n  (osl  of  8109).  Sobd 
Eoes  xadxate  fix  powu  fractioss  for  sljaah:  tbs  csn^e 
parassisr  xsdieatss  tbs  Bur  baad.  Casbsd  Hass  isdkats 
pomrr  fractioss  for  £hsrcd  looat  £dds.  Ahbougb  povrr 
fractSoss  froa  soc;essht  badds  of  eotss  are  too  to 
Ubd.  tbs)r  fsseral]>  faS  is  tbs  sasss  kfi'fijbt  5<0  order  as 
tboK  for  dptal  feacds. 

irequfnc)'  band  increas».  These  inronnation 
distribution  curses  are  an  ecological  property  of 
our  set  of  letter  stimuli;  different  curves  uould 
be  needed  describe  other  stimulus  sets. 

The  dashed  curves  in  Fig.  8  ucre  derived  from 
random  noise  filtered  in  each  of  the  six  fre¬ 
quency  bands  (h«-4,)  The  distribution  of  noise 
power  is  xerx  similar  between  the  various  bands, 
enormously  more  so  than  the  distribution  of 
signal  power.  For  our  letter  stimuli,  stimulus 
information  coalesces  to  a  smaller  number  of 
spatial  locations  as  spai,„l  frequency  increases 

Correlation  beitieen  signals  A  more  abstract 
way  of  describing  the  change  of  infoimaiion 
with  bandwidth  is  to  note  that  letters  become 
less  confusibic  with  each  other  in  the  higher 
frequency  bands.  A  good  measure  of  confusibil- 
ity  is  the  average  pairwise  correlation  between 
the  26  iciter  templates  in  each  frequency  band 
(Table  3)  The  average  correlation  between 
letter  templates  diminishes  from  0  94  in  band  0 
to  0  31  in  band  5  In  a  band  in  which  templates 
have  a  pairwise  correlation  over  09,  the  over¬ 
whelming  amount  of  intensity  vanalion  (‘infor¬ 
mation'')  IS  useless  for  discnmination  Small 
wonder  that  subjects  fail  completely  in  this 
band  Overall,  performance  of  the  ideal  dis- 
cnminator  a,id  of  observers  improves  as  the 
correlation  decreases,  but  there  is  no  obvious 
way  to  use  the  pairwise  correlation  between 
templates  to  predict  performance. 

Nearest  neighbors  The  analysis  of  nearest 
neighbors  is  a  useful  technique  for  predicting 
accuracy  by  the  analysis  of  the  possible  causes 
of  errors  We  can  regard  a  filtered  image  i,  of 
letter  i  as  a  vector  in  a  space  of  dimensionality 
8100  (90  X  90  pixels)  When  noise  is  added,  the 


Table  3.  Avenge  paiiwise  coirdaliess  as4 
anresl  ss'gbbon  (EivfHiWn  £snan  x  I0~’) 


CbmbtSoss 

Nesrest  ad^bor 

0 

0.94 

0.01 

1 

091 

030 

2 

0.^ 

13 

038 

23 

4 

033 

3.1 

5 

031 

41 

possible  positions  of  t,  are  described  by  a  doud 
whose  dimensions  are  determined  by  the  s/n 
ratio.  A  neighboring  letter  k  may  be  confus^ 
with  letter  i  when  the  doud  around  t,  envdopes 
If  The  closer  the  neighbor,  the  greater  the 
opportunity  for  error.  Table  3  pves  the  average 
normalized  distance  to  the  nearest  neighbor  in 
each  of  the  bands.  The  increase  in  distance  to 
the  nearest  neighbor  reflects  the  improvement  in 
the  representation  of  signals  as  spatial  frequency- 
increases. 

We  consider  possible  causes  of  lower 
efiidency  of  discrimination  in  bands  below  b^. 
The  letters  in  these  bands  have  high  pair-wise 
correlations  and  the  mean  band  frequency  is 
less  than  the  object  frequency.  This  means 
that  letters  differ  only  in  subtle  differences  of 
shading,  a  feature  that  we  usually  do  not  think 
of  as  shape.  Observers  would  need  to  be  able  to 
utilize  small  intensity  differences  to  distinguish 
between  letters  To  eliminate  an  alternative  ex¬ 
planation  (the  smaller  number  of  frequency 
components  in  the  low-frequency  bands),  we 
conducted  an  informal  expcnmcni  with  a  lower 
fundamental  frequency  The  fundamental  fre¬ 
quency,  which  IS  outside  the  band,  nevertheless 
determines  the  spacing  of  frequency  com¬ 
ponents  within  the  band  Reducing  the  funda¬ 
mental  frequency  of  the  lelier  by  one-half 
increases  the  number  of  frequency  components 
in  the  band  by  a  factor  of  4.  (A  256  x  256 
sampling  grid  was  used  rather  than  128  x  128  ) 
These  4  x  more  highly  sampled  stimuli  were  not 
more  discnmmable  than  the  ongmal  stimuli 
This  suggests  that  the  internal  letter  represen¬ 
tation  (template)  that  subjects  bnng  with  them 
to  the  expenment  cannot  utilize  low-frequency 
information,  even  when  it  is  abundantly  avail¬ 
able  Whether,  with  sufficient  training,  subjects 
could  learn  to  use  low  spatial  frequencies  to 
make  Iciter  discnmmalions  is  an  open  question 

SU.MVtARV  AND  CONCLUSIONS 

I  Visual  discnmination  of  letters  m  noise, 
spatially  filtered  in  2-oclavc  wide  bands,  is 
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indepmdcnt  of  \icwing  disunce  (retinal  fre¬ 
quency)  but  improres  as  spatial  frequency 
increases. 

2.  The  improvement  in  pcrfonnance  with 
increasing  spatial  frequency  results  mainly  from 
an  increase  in  the  objective  amount  of  infor¬ 
mation  transmitted  by  the  filters  with  increasing 
frequency  (because  filter  bandwidth  was  pro¬ 
portional  to  center  frequency)  which  is  mani¬ 
fested  as  objectively  less  confusible  stimuli  in  the 
higher  bands, 

3.  The  comparison  of  human  performance 
with  that  of  an  estimated  ideal  discriminator 
demonsi  ates  that  humans  achieve  optimal 
discrimination  (a  remarkable  42%  efficiency) 
when  letters  are  defined  by  a  2-octave  band  of 
spatial  frequencies  centered  at  1  cycle  per  letter 
height  (mean  frequency  1.5  c/letter)  This  high 
effidency  of  discrimination  is  maintained  over  a 
32-1  range  of  viewing  distances 

4.  Detection  efficiency  was  mvanani  over  a 
range  of  retinal  spatial  frequencies  in  which  the 
contrast  threshold  for  detection  of  sine  gratings 
(the  modulation  transfer  function,  MTF)  vanes 
enormously  The  independence  of  detection  per¬ 
formance  and  retinal  size  held  for  all  frequency 
bands 

5  A  part  of  the  loss  of  human  efficiency  in 
discrimination  as  spatial  frequency  exceeded  I 
c, object  height  may  have  been  due  to  the  sub¬ 
jects'  inability  to  identify,  to  selectively  attend, 
and  to  utilize  the  smaller  fraction  of  information- 
nch  pixels  in  the  higher  frequency  images 

6.  Finally,  it  is  important  to  note  that 
without  the  comparison  to  the  ideal  observer, 
we  would  noi  have  been  able  to  understand  the 
components  of  human  performance  in  the 
different  frequency  bands 

Atknoy^leJgerreris — Wc  acknovhWge  ihc  Urge  cor.m- 
bution  of  Chades  Chubb  to  the  forntulation  and  solution  of 
the  ideal  discnnipaior  Wc  thank  Michael  S  Land)  for 
helpful  comments  and  Robert  Ptcardi  for  skillful  technical 
assistance  The  project  was  supported  b)  USAF.  Ufe 
Sciences  Directorate.  Visual  Information  Processing  Pro¬ 
gram.  grants  85-0364  and  88  0140 

REFERENCES 

Barlow.  H  B  (1978)  The  efficwncy  of  detecting  changes  of 
density  m  random  dot  patterns  Vwon  Restarth.  13. 
637-650 

Barlow,  H  M  (1980)  The  absolute  eflkienc)  of  perceptual 
dcosions  Ph'-sophicul  Tra^saciions  of  iht  Raol  Soeuiy. 
London  B.  290.  71-8: 

Barlow.  H  B  A  Reeses.  B  C  (1979)  The  scrsaiihi)  and 
absolute  efT.cienc)  of  detecting  mirror  ij-mmetrv  in  ran- 
doiTi  dot  displajs  Vision  Reseo'ch  19.  783-793 


;rgtss.  A.  (1^  Visual  ^gnal  deuctioa^lll.  Or. 
Bayesian  me  cl  pnor  knowledge  and  cross  correlation. 
Jmtko!  of  the  Optical  Society  of  Aneriea  A.  2(9). 
;498-1507. 

Bsrg^.  A.  (1986).  Ititduced  internal  nobe  in  sisual  decision 
tasks  Jourtal  of  the  Optuel  Society  of  Arnerico  A,  S,  93 
Burt. P.3  (1983)  The La;dadan pyramid 

as  a  compact  code  IEEE  Transactions  on  Com- 
nsancatwns,  Ct^-^did).  532-540- 
Cacpbell.  F.  W.  A  Robson,  3.  G.  (1968)  Appbeatron  of 
Founer  analysts  to  the  visiHhiy  of  gratings.  Journal  of 
Pkystoloty,  loni.^n  197.  551-566 
Carbon.  C.  R-  Moeller.  3.  R.  A  Anderson.  C.  H  (1984) 
Visual  illusio.ts  without  low  spatial  frequencies.  Vision 
Reseotek,  24,  1407-1413. 

Chubb.  C..  Sperling.  G.  A  Parish.  D.  H  (1987)  Designing 
psychophysical  discrimination  tasks  for  which  ideal  per¬ 
formance  is  computationally  tractable  Unpublished 
manusenpt.  New  York  Unnersti),  Human  Information 
Processing  Laboratory. 

Das'klson,  M  U  (196S)  Pertuibatioo  approach  to  spatial 
tightness  interaction  in  human  vision.  Journal  of  the 
Optical  Society  of  America  A,  SS.  13(X)-1309 
Fiurcntin),  A .  Maffci,  L  A  Sandini.  G  (1983)  The  role  of 
hi^  spatial  frequencies  in  face  perception  Perception.  12. 
195-201 

Geisler.  >V  S  (1984)  Physical  limits  of  acuity  and  hyper* 
acuity  Journal  of  the  Optical  Socicii  of  America  A.  I. 
775-782 

Geisler,  W  S  (1989)  Sequential  ideal-observer  analysis  of 
wual  discnminalions  Psy chological  Rer m ,  21. 267-314 
Cinsburg.  A.  P  (1971)  Psychological  correlates  of  a  model 
of  the  human  visual  system  In  Proceedings  of  the 
Souwol  Aerospace  Electronics  Conference  (SAECOS) 
(pp  283-290)  Ohio  IEEE  Trans  Aerospace  Electronic 
Systems 

Gmsburg.A  P  (1978)  Visual  information  processing  based 
on  spatial  fillers  constrained  by  biological  data  Aero¬ 
space  Medical  Research  Laboratory,  1(2}  Dayton.  Ohio 
Ginsberg.  A  P  (1980)  Specifying  relevant  spatial  infor¬ 
mation  for  image  evaluation  and  display  designs  An 
explanation  of  how  wc  see  certain  objects  Proceedings  of 
SW.  21,  219-227 

Ginsberg  A  P  A  Evans,  P  W  (1979)  Predicting  visual 
illusions  from  filtered  imaged  based  on  biological  data 
Journal  o*  she  Optical  Soctei}  of  America  A,  69,  1443 
Harmon.  L  D  A  Julesz.  B  (1973)  Masking  in  visual 
recognition  E/Tecis  of  iwo-dimensional  filtered  noise 
Scunce,  ISO,  1194-1197 

Janez.  L  (1984)  Visual  grouping  without  low  spatial  fre 
quencics  Vwon  Research.  24,  271-274 
Kersten.  D  (1984)  Spalial  summation  m  visual  noise 
Vision  Research,  24,  1977-1990 
Ugge,  G  E .  Pelb.  D  G  .  Rubin,  G  S  A  Schlcskc,  M  M 
(1985)  Psychophysics  of  reading -I  Normal  vision 
Vision  Research.  25(2),  239-252 
Legge.G  £, Kersten, D  A  Burgess,  A  E  (1987)  Contrast 
discnmination  in  noise  Journal  of  the  Optical  Socieii  of 
Ametico  A,  4(2),  391-404 

van  Meeteren.  A  A  Barlow.  H  B  (1981)  The  statistical 
efficiency  for  detecting  smuso.dal  modulation  of  average 
dot  dens.ty  tn  random  figures  Vision  Research.  21, 
765-777 

vanNes,F  L  ABouman.M  A  (1967)  Spat.al modulation 
transfer  in  the  human  eye  Journal  of  ihc  Optical  Societx 
of  America  57,  401-406 


AIR  '^ORCT'  OF  (a: 

Nca  .  (  ' 


1415 


>rl 

C(NJ 

Ci  r-i 
t 

O 
1  a» 


u 


5 


e~* 


Jy1 


Approved 'or 

diStrifc^t.ion  frcqucDdes  *cd  daeriaiianion  tSckccy 


Nonsau,  J.  &  Ehrbch.  S.  (19S7).  Spatial  frequcacy  fiJtenag 
»ad  urgei  idcnliScation.  Kawi  Restarch,  27(IX  97-96. 
Parijb.D,H.iSperHcg.G  (1987a).  Object  spatial  freqaco- 
acs.  retinal  ^tUl  frequesdes.  and  tbe  effideney  of  letter 
dtscrinunatioR.  Mathetnatical  Studies  to  Perception  and 
Cognition.  87-8.  Ne»‘  York  Usit'ersity.  Departeseat  of 
PS)'chology. 

Parish.  D.  H  &  Sperling.  G  (19S7b)  Oi^  qatial  fre¬ 
quency,  not  retinal  ^tul  frequency,  deiemuces  tdesti- 
iication  efhdency.  Jnttittgotkt  Opkikelmcl^y  aid  Visual 
Sctencf  {ARVO  Suppl).  WX  359. 

Pa\d.  M  .  Sperling,  G.,  Riedl,  T.  &  Vandeibcek.  A  (1987). 
Tbe  Icnits  of  \isua]  coraaunicatioa  The  effect  of  signal-to¬ 
ne^  ratio  on  tbe  intdlip^ty  of  American  sign  language 
Journal  of  the  Optical  Society  of  America  A,  4, 2355-2365 
Pelli.  D.  G  (1981)  Effects  of  vuual  noise  Ph  D,  disser¬ 
tation.  University  of  Cambridge.  England 
Shannon.  C  E.  4.  Weawr,  W.  (1949)  The  mathematical 
theory  of  commurAcation  Urbana  Universit)  of  Ilhnois 
Press 

Sperling,  G  (1989)  Three  suges  and  two  s)-steffl$  of  visual 
processing  Spatial  Vision,  4  (Pra:dny  Memorial  Issue), 
183-207 

Sperling  G  L  Pansh,  D  H  (1985)  Forest-m-the-Trecs 
illusions  Imesiifiaine  Ophthalmology  and  Visual  Science 
(ARVO  Suppn  26,  285 

Tanner  W  P  ABtrdsall.T  G  (1958)  Definitions  of  d' and 
n  av  ps>choph>$]cal  measures  Journal  of  the  Acouiucal 
Societi  of  America,  30.  922-928 
van  Tress,  H  L  (1968)  Detection,  estmouon  end  modu‘ 
laiion  theory  New  York  Wile) 

Winer  B  }  (1971)  Statistical  principles  in  experimental 
psichologi  New  York  McGraw-Hill 


APPENDIX 

Beth  sub-ideal  and  super-ideal  discriminators  must  compute 
estimates  of  the  likelihood  that  the  stimulus  u,  #  was  pro¬ 
duced  with  template  and  noise  n*  where  k  is  the  letter 
used  to  generate  the  stimulus,  i  is  an  arbitrar)  letter,  and  b 
indexes  spatial  frequenc)  band  Let  jc  be  an  index  on  the 
pixels  of  the  image  1  $  j  <  8100,  for  the  90  x  90  images  of 
the  expenments 

For  the  Monte  Carlo  simulations  of  the  super-ideal 
discriminator,  the  unknown  stimulus  parameters,  a.  ,and  o; 
arc  computed  during  stimulus  construction  and  their  exact 
values  are  supplied  to  the  discriminator  a  priori  The 
sub-ideal  discnminator,  however  must  estimate  these  par¬ 
ameters  from  the  data  as  follows 

Sub’IdeaJ  Parameter  Esumaiton 
Recall  that  stimulus  contrast  is  modulated  for  an)  pixel 
X  in  the  image 

U,  a(x  j  -  ff.  ,lr, .( X )  -f  +  9,  *(«)  (Al) 

The  scaling  constant  #  limits  range  of  real  values  for  each 
pixel,  prior  to  quantization  to  the  open  interval  (-05. 
255  5)  the  addition  of  9,4tJ  called  quantization  noise, 
rounds  off  pixel  values  to  integers 
For  each  bandpass  filtered  template  r,  ^  we  first  compute 
the  correlation  p* ,  of  the  template  to  the  stimulus  u* » 

p,  ■ - 1 - — —  (A2) 


To  compute  the  bkebhood  estusates  for  each  template  r^». 
wc  ciat  be  aWe  to  reverse  tbe  effect  of  Thus  we  define 
**  1/^4.*  choose  so  as  to  minimize  the  expression 

(A3) 

SoKing  for  pves  us. 

CIlA.(x)F1 « 

“..“(V.,  J  - - I  (A-) 

1  ZK.(j;)Fj 

Fsnatt)  wo  set: 

-i  “  J  E  (A5) 

where  X  •  8100.  the  number  of  pixels  in  the  image 


Likelihood  Esttmalion 

With  estimates  of  a),  and  for  the  sub-idea)  dis- 
enmioator.  and  the  a  priori  values  for  the  super-ideal 
discfimiiutor.  we  can  formulate  a  maximum  bkelihood 
estimator  B)  rearranpng  terms  of  equation  (Al)  and 
dividing  both  sides  by  p  )ields 


t/,  t(x) 


Ut)-»«,(t) 


9.a(0 


(A6) 


Substituting  for  I  fi,  and  b)  transposing  into  the  fre- 
quenc)  domain,  denoted  b)  upper-case  letters  and  indexed 
by  to,  we  have 

»(o>  -  7‘.a(w)  “  AVIty)  *  KrQ.  (A7> 
Note  that  the  left  side  of  equation  (A7)  is  simpl)  a 
difference  image  between  the  stimulus  ^iio)  and  the 
template  This  difference  is  exact!)  equal  to  the  sum 
of  the  luminance  and  quantization  noise  onl)  when  the 
correct  template  is  chosen  (i«)i.)  ^^’hen  the  incorrect 
ten>pUte  is  chosen  (r  L )  the  nghi  hand  side  of  equation 
(A7)  IS  equal  to  the  sum  of  the  noise  sources  plus  some 
residue  that  is  equal  to  »(u>)  -  Under  the 

assumption  that  quantization  noise  can  be  modeled  as 
independent  additive  noise  in  the  frequenev  domain,  the 
densit)  A  of  the  )oint  reahzaiion  of  the  nghi-hand  side  of 
equation  (A7)  is  given  by 

_ _ X _ 


(A8) 


where  F»(ai)  is  simpl)  the  kernel  of  filter  b,  in  the  frequenc) 
domain  Dropping  the  multiplicative  term  in  equation  (AS;, 
which  docs  not  depend  on  the  template  T,  and  taking  logs 
the  ideal  discnmmator  chooses  the  template  that  minimizes 


Finall).  it  IS  more  convenient  to  compute  the  power  of 
the  quantization  noise  in  the  space  domain  (o^)  than  in  the 
frequency  domain  (u^)  Spatial  quantization  noise. 

q,  »(x).  IS  umformlv  distributed  on  the  interval  (-05  0  5) 
so  that  0^  IS  computed  as 


and  IS  equal  to  i/12 


(AiOl 
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THE  KINETIC  DEPTH  EFFECT  AND  OPTIC  FLOW— II. 
FIRST-  AND  SECOND-ORDER  MOTION 

Michael  S.  LakdyJ  Bakbaju  A.  Dosher/  George  Sperling’  and  Mark  £.  Perkins’ 
’Psycboloo'  Dcpanacsi,  Nsu  York  Uaii.-cr«t).  KY  10003  *ad  *PQ«hoIogy  Deparunest, 
CoIuaWa  Uiaverjit>.  KY  10027.  U.SA 

(Recttct^  24  August  19S9.  ifi  rfrutd  form  I  May  1990) 

Abstract— We  u$e  z  dif&cuU  shape  idesnhcation  task  to  analjTe  how  buma&s  extract  3D  surface  structure 
from  dysanuc  2D  stimuls— the  ktoeue  depth  efTeci  (KDE)  Stiault  composed  of  luminous  tokens  moMSS 
on  a  less  luminous  background  yield  accurate  3D  shape  identibcabon  regardless  of  the  particular  token 
used  (either  dots,  bnts.  or  disks)  These  displays  stimulate  both  the  lst*order  (Fourier‘ecerg>)  motion 
detectors  and  2nd*order  (nonFouner)  motion  detectors  To  determine  which  system  supports  KDE,  wt 
emplo)  stimulus  manipulations  that  weaken  or  distort  lst>order  motion  cneig>’  (e  g  frame-to-frame 
altertution  of  the  contrast  polant)  of  tokens)  and  manipulations  that  create  /mcrohc/cnred  stimuli  which 
have  no  useful  Ist'Order  motion  energy  All  manipulations  that  impair  l$i>order  motion  energ> 
eorrespondmgiN  impair  30  shape  identification  !n  certain  cases.  2nd*order  motion  could  support  bmiied 
KDE.butii  was  not  robust  and  wasoflow  spaual  resolution  We  conclude  that  Ist-order  motion  detectors 
are  the  pnmars  input  to  the  kinetic  depth  $)stem  To  determine  minimal  conditions  for  KDE  we  use  a 
two  frame  disp1a>  Under  optimal  conditions  KDE  supports  shape  identihcation  performance  at  63-94% 
of  full-rotation  displaw  (where  baseline  is  5%)  Increasing  the  amount  of  3D  rotation  portra>ed  or 
introducing  a  blank  intcr-sumulus  interval  impairs  performance  Together,  our  results  confirm  that  the 
human  KDE  computaiion  of  surface  shape  uses  a  global  optic  flow  computed  pnmantv  b>  Ist-order 
motion  detectors  with  minor  2nd-or<jer  inputs  Aaurate  3D  shape  identification  requires  onl>  two  view-s 
and  therefore  does  not  require  knowledge  of  acceleration 

KDE  Kinetic  depth  effect  Structure  from  motion  Shape  Optic  flow 


INTRODUCTION 

When  a  collccuon  of  random!)  positioned  dots 
moves  on  a  CRT  screen  with  motion  paths  that 
arc  projections  of  ngid  3D  motion,  a  human 
viewer  perceives  a  stnkinic  impression  of  three* 
dimcnsionalii)  and  depth  This  phenomenon 
of  depth  computed  from  relative  motion  cues 
IS  know'n  as  the  kinetic  depth  effect  (KDE. 
Waliach  &  C'ConnclI.  1953) 

What  are  the  imporiani  cues  that  lead  to  a  3D 
percept  from  such  a  display'’  Is  it  motion,  or  are 
there  other  important  cues'’  if  u  is  motion,  then 
what  kind  of  motion  detection  sysicm(s)  are 
used  to  support  me  siructurc*from-n'Otion  com¬ 
putation'’  Is  a  computation  of  velocii  /  sufficient, 
or  are  more  elaborate  measurements  necessary, 
such  as  of  acceleration'’  These  are  the  questions 
that  we  address  in  this  paper 
In  a  senes  of  recent  papers  (Dosher,  Landy  & 
Sperling.  1989a.  b,  Sperling,  Landy.  Dosher  & 
Perkins,  1989,  Sperling.  Dosher  A:  Landy,  1990). 
we  examined  t'‘  c  cues  neccssar)  for  subjects  to 
pcrceiNC  an  accurate  representation  of  a  3D 


surface  portrayed  using  random  dot  displays  In 
each  tnal  of  a  new*  shape  identification  task  we 
devised,  subj^is  Mew  a  random  dot  represen* 
tation  of  one  of  a  set  of  53  3D  shapes  and 
identify  the  shape  and  rotation  direction  Shape 
identity  feedback  optimizes  the  subject’s  ability 
to  compute  shape  from  each  type  of  motion 
stimulus  For  accurate  performance,  the  task 
requires  cither  a  3D  percept  or  a  subject  strategy 
that  uses  2D  velocity  information  in  a  manner 
that  1$  computationally  equivalent  to  that  re¬ 
quired  to  solve  for  3D  shape  (Sperling  et  al . 
1989,  1990,  see  the  discussion  of  expt  2,  below) 
We  have  shown  that  the  only  cue  used  for  the 
perception  of  thrcc-dimensionality  in  these  dis¬ 
plays  IS  motion  (Spcrl.ng  cl  al ,  1989,  1990) 
Further  experiments  determined  that  global 
optic  flow  1$  used  rather  than  the  position 
information  for  individual  dots,  since  accuracy 
remains  high  when  dot  lifetii  tes  arc  reduced  to 
as  little  as  two  frames  (Dosher  ei  al,  1989b)  In 
that  paper,  we  concluded  that  the  input  to  the 
KDE  computation  is  an  optic  flow  generated  b> 
a  Ist-order  motion  detection  mechanism,  such 
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as  ihs  Rcichardt  dciector  (Rcichardt,  1957). 
Two  manipulations  that  perturb  Ist-order 
motion  energy  mechanisms— flicker  and  po¬ 
larity  alternation— also  interfered  with  KDE 
(Dosher  et  al..  1989b}.  In  polarity  alternation, 
dots  change  over  time  from  black  to  white  to 
black  on  a  gray  background.  When  compared  to 
dots  that  remain  white,  polarity  alternation  was 
equally  or  slightly  more  detectable  in  a  detection 
task,  was  poorer  but  still  well  above  chance  in 
a  discrimination  of  direction  of  motion  task 
(computed,  presumably,  using  tracking  of  the 
dots  or  using  more  elaborate,  2nd-order  motion 
detection  mechanisms)  but  was  useless  for  tasks 
requiring  KDE  or  motion  segregation.  These 
latter  two  tasks  require  the  evaluation  of  vel¬ 
ocity  m  a  number  of  locations  simultaneously 
(Sperling  el  al .  1989).  Shape  identification  per¬ 
formance  in  a  range  of  conditions  was  shown  to 
be  monotonic  with  a  computed  index  of  Ist- 
order  net  directional  power  m  the  stimuli 
(Dosher  et  al.  1989b)  Hence,  for  sparse 
dot  stimuli.  KDE  depends  upon  a  simple 
spatio-temporal  (Ist-ordcr)  Fourier  analysis  of 
multiple  local  areas  of  the  stimulus 

In  this  paper,  we  further  examine  and  gener¬ 
alize  the  contributions  of  several  types  of 
motion  detectors  to  the  optic  flow  computations 
used  b>  the  structure-from-moiion  mechanism 

MOTION  ANALYSIS  MODELS  AND  THE  KDE 

} St -order  motion  anahsis 

To  motivate  the  stimulus  conditions  studied 
here,  w’c  begin  by  summarizing  models  of  early 
motion  detection  and  analysis  Several  recent 
motion  detection  models  (van  Santen  &  Sper¬ 
ling,  1984.  1985,  Adelson  &.  Bergen.  1985.  Wat¬ 
son  &  Ahumada.  1985)  share  as  a  common 
antecedent  the  model  proposed  by  Reichardt 
(1957)  We  refer  to  this  class  of  models  as 
Ist-ordcf  motion  detectors.  Below.  2nd-order 
mechanisms  involving  add.iional  processing 
stages  will  be  discussed.  In  the  Reichardt  detec¬ 
tor,  luminance  is  measured  at  two  spatial  lo¬ 
cations  A  and  B  The  measurement  at  position 
A  IS  delayed  m  time,  and  then  cross-correlated 
over  time  with  the  measurement  at  position  5, 
resulting  in  a  ’‘half-dciector”  sensitive  to 
motion  from  position  /I  to  5  A  second  such 
’‘half-detector”  sensitive  to  motion  from  Bio  A 
IS  set  in  opponcncy  with  the  first,  resulting  in  the 
full  motion  detector  van  Santen  and  Sperling 
(1984,  1985)  have  investigated  this  model  along 
with  extensions  involving  voting  rules  for  com¬ 


bining  outputs  of  many  detectors  to  enable 
predictions  of  psychophysical  experiments,  re¬ 
sulting  In  their  Elaborated  Reichardt  Detector 
(ERD). 

An  alternative  way  of  characterizing  motion 
detection  is  in  the  frequency  domain.  A  motion 
detector  can  be  built  of  several  linear  spatio- 
temporal  filters-  Each  filter  is  sensitive  only  to 
energy  in  two  of  the  four  quadrants  in  spatio- 
temporal  Fourier  space  (cu,,  (a,).  In  other 
words,  the  filters  arc  not  separable.  Their  recep¬ 
tive  fields  are  oriented  in  space-time,  and  thus 
they  are  sensitive  to  motion  in  a  particular 
direction  and  at  a  particular  scale  (Adelson  & 
Bcrgwi.  1985:  Burr,  Ross  &  Morronc.  1986. 
Watson  &  Ahumada,  1985).  The  Founcr 
**energy”  (the  squared  output  of  a  quadrature 
pair  of  fillers)  in  each  of  two  opposing  motion 
directions  is  computed,  and  put  in  opponcncy. 
This  “motion  energy  detector”,  proposed  by 
Adelson  and  Bergen  (1985),  and  the  ERD  differ 
in  their  construction  and  in  the  signals  available 
at  the  subunit  level,  but  are  indistinguishable  at 
ihcir  outputs  (Adelson  &  Bergen,  1985.  van 
Santen  &  Sperling.  1985) 

The  structurc-from-moiion  computation  re¬ 
lies  upon  the  measurement  of  image  velocities 
at  several  image  locations  The  KDE  shape 
identification  task  that  we  use  here  can  be  solved 
by  categorizing  velocity  at  six  spatial  locations 
into  three  categories  leftward,  approximately 
zero,  and  nghtward  (Sperling  et  al .  1989)  Thus, 
in  order  to  discnminate  (he  53  test  shapes 
by  KDE,  motion  detection  must  be  followed 
by  at  least  some  rudimentary  local  velocity 
calculation. 

In  order  to  signal  velocity,  the  outputs  of 
more  than  one  such  Isi-order  .motion  detector 
must  be  pooled  Speed  may  be  computed  by 
pooling  only  two  detectors  (a  motion  and  a 
“static”  dciector.  Adelson  &  Bergen,  1985)  To 
signal  motion  direction,  signals  must  oe  pooled 
across  a  variety  of  oncntaiions  (Watson  & 
Ahumada.  1985)  Finally,  in  order  to  solve  the 
“aperture  problem"  for  more  complex  stimuli 
(Bun  &  Sperling,  1981,  Marr  &  Ullman,  1981), 
signals  may  be  pooled  over  a  variety  of 
directions  and  perhaps  scales  (Heegcr.  1987) 

In  the  previous  paper  (Dosher  et  al .  1989b), 
shape  identification  performance  was  shown  to 
relate  directly  to  the  quality  of  the  signal  avail¬ 
able  from  Ist-order  motion  detection  mechan¬ 
isms  Each  stimulus  consisted  of  a  large  number 
of  dots  on  a  gray  background  representing  a  2D 
projection  of  dots  on  the  surface  of  a  smooth  3D 
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shape  under  rotar>  oscillalion.  In  one  condition 
(contrast  polarity  alternation),  the  dots  were 
firs:  brighter  than  the  background  (“white-on- 
gray"),  then  darker  than  the  background 
(“black-on-gray”),  then  bright  again,  in  succes¬ 
sive  frames.  For  a  dense  random  dot  field  (50% 
black/50%  white)  under  simple  planar  motion, 
polaniy  alternation  causes  a  percept  of  motion 
opposite  to  the  true  direction  of  motion  (the 
“reverse-phi  phenomenon",  Anstis  &  Rogers, 
1975),  reverse-phi  is  thought  to  reflect  a  spatio- 
temporal  Founer  analysis  of  the  stimulus,  since 
contrast  reversal  reverses  the  direction  of 
motion  of  the  lowest-frequencv  Fourier  com¬ 
ponents  (van  Santen  &  Sperling.  1984)  With 
contrast  reversal,  the  outputs  of  Ist-order 
motion  detection  mechanisms  no  longer  simply 
signal  the  intended  direction  and  velocity  of 
motion  Contrast  reversal  stimuli  do  noi  yield 
a  depth-from-moiion  percept  (Dosher  el  al . 
1989b)  We  take  this  as  evidence  thai  the 
KDE  relies  upon  input  from  a  Isl-order  motion 
analysis 

ind-order  motion  anaivsis 

For  the  sparse  random  dot  stimuli  (Dosher  et 
al..  1989b).  contrast  polarity  alternation  elimi¬ 
nated  the  perception  of  structure  from  motion 
Nonetheless,  subjects  could  judge  the  direction 
of  patches  of  contrast  polantv  alternating  dots 
undergoing  simple  translation  What  kind  of  a 
motion  detector  might  be  used  to  correctlv 
judge  the  motion  of  a  translating,  polanty- 
aliernaiing  dot’’  One  simple  possibility  would  be 
to  first  apply  a  luminance  nonlinearity  to  the 
input  stimulus  For  example,  if  t' ;  input  stimu¬ 
lus  were  full-wave  rectified  aoout  the  mean 
luminance,  the  polarily-aliernaling  stimulus 
would  be  converted  to  the  equivalent  of  rigid 
motion  of  a  white  dot  on  a  gray  background 
Thus,  a  full-wave  rectifier  of  contrast  followed 
by  a  Ist-order  analyzer  (such  as  those  discussed 
above)  would  be  capable  of  analyzing  such  a 
motion  stimulus  correctly  (Chubb  &.  Sperling, 
198Sb.  1989a.  b) 

A  motion  detection  system  consisting  of  a 
contrast  nonlinearity  followed  by  a  Ist-order 
deteetor  is  one  example  of  a  wide  class  of 
"2nd-order  detection  mechanisms”,  each  of 
which  consists  of  a  linear  filtering  of  the  input 
(spatial  and/or  temporal),  followed  by  a  con¬ 
trast  nonlinearity,  followed  by  a  standard  Ist- 
order  motion  detection  mechanism  A  number 
of  results  demonstrate  the  existence  of  both  1st- 
and  2nd-order  motion  mechanisms  and  show' 


the  contribution  of  both  to  the  perception  of 
planar  motion  (Anstis  &  Rogers,  1975,  Chubb 
&  Sperling,  i988b,  1989a.  b;  Lelkens  & 
Koenderink,  1984;  Ramachandran,  Rao  & 
Vidyasagar,  1973;  Sperling,  1976). 

Can  both  1st-  and  2nd-order  motion  mechan¬ 
isms  be  used  by  the  KDE  system?  The  polarity- 
alternating  dots  did  not  yield  an  effective  KDE 
percept  of  our  3D  shapes.  If  one  accepts  the 
existence  of  both  1st-  and  2nd-order  motion 
mechanisms,  why  didn’t  the  2nd-order  system 
support  KDE?  The  KDE  stimuli  were  relatively 
small  (3.7  x42deg)  and  viewed  foveally  (eye 
movements  were  permitted  throughout  the  2  sec 
stimulus  duration)  Evidence  from  studies  of 
planar  motion  suggests  that  both  systems  were 
available  under  these  conditions  (Chubb  & 
Sperling,  1988b)  For  polanty  alternation 
stimuli,  the  most  salient  low  frequency  com¬ 
ponents  from  the  Ist-order  system  were  in 
the  wrong  direction  We  assume  that  the  2nd- 
order  system  yields  a  correct  (if  attenuated) 
analysis  Bad  shape  identification  performance 
may  have  resulted  either  from  the  perturbed 
Ist-order  analysts  or  because  of  competition 
between  the  1st-  and  2nd-order  systems  (which 
signaled  opposite  directions  of  motion  in 
some  frequency  bands)  Our  evidence  (Dosher 
ei  al .  1989b)  demonstrated  that  Ist-order 
system  input  is  the  predominant  input  to 
KDE.  but  It  did  not  exclude  the  possibility  of 
input  from  2nd-ordcr  motion  detection  mech¬ 
anisms  To  approach  that  question  we  con¬ 
sider  a  KDE  stimulus  that  produces  a  simple 
2nd-ordcr  motion  analysis,  but  to  which 
the  Isi-order  motion  system  is.  statistically, 
blind 

SdiCTobalanced  motion  stimuli 

Chubb  and  Sperling  (1988b)  defined  a  class  of 
stimuli,  called  microbalanced,  among  which  are 
stimuli  with  the  properties  that  we  desire  In 
expt  I  we  concentrate  on  two  examples  of 
microbalanced  motion  s'lmuli  These  stimuli  are 
random  in  the  sense  that  any  given  stimulus  is 
a  realization  of  a  random  process  As  proven  by 
Chubb  and  Sperling  (1988b),  if  a  stimulus  is 
microbalanced  then  the  expeeted  output  of 
every  Ist-order  detector  (ERD  or  motion 
energy  detector)  will  be  zero  Thus,  Chubb  and 
Sperling  defined  a  class  of  stimuli  for  which  a 
consistent  motion  signal  requires  a  2nd-order 
motion  analysis,  and  showed  that  the  2nd- 
ordet  analysis  predicted  observers'  percepts  for 
several  examples  of  the  class 
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The  polarity  alternation  stimulus  is  not 
microbalanced;  any  given  frequency  band  does 
show  consistent  motion,  with  the  lowest  spatial 
frequencies  signalling  motion  in  the  wrong  di¬ 
rection.  This  stimulus  can  be  transformed  into 
a  microbalanced  one  as  follows:  for  each  dot, 
choose  the  contrast  polarity  randomly  and  inde¬ 
pendently  for  every  frame.  Any  given  Ist-order 
detector  will  be  just  as  hkely  to  signal  ri^tward 
motion  as  it  is  to  signal  leftward  motion  since  it 
will  either  see  the  same  contrast  polarity  across 
any  successive  pair  of  frames  or  it  will  see 
contrast  polanty  alternate,  with  equal  prob¬ 
ability.  One  question  we  examine  in  this  paper 
IS  whether  the  motion  signal  available  from 
2nd-order  mechanisms  can  be  used  to  compute 
3D  structure. 

We  present  two  experiments.  In  the  first,  we 
examine  performance  on  a  shape  identification 
task  for  a  variety  of  KDE  stimuli.  Several  types 
of  stimuli  provide  good  Ist-order  motion. 
Others  are  microbalanced  and  hence  can  only  be 
analyred  by  2nd-order  mechanisms.  Still  others 
offer  good  Ist-order  motion,  but  involve 
camouflage  similar  to  that  available  m  some  of 
the  microbalanced  conditions.  We  find  that 
Ist-order  motion  is  used,  and  that  input  from 
2nd-order  mechanisms  may  also  be  used  but  is 
not  as  robust  In  a  second  experiment,  we 
examine  the  residual  shape  percept  from  two- 
frame  KDE  stimuli  in  order  to  determine 
whether  a  single  velocity  field  is  a  sufficient  cue 
for  shape  identification  or  whether  acceleration 
also  IS  needed 

EXPERIMENT  1.  POLARITY  ALTERNATIO.N, 
MICROBALANCE.  AND  CA.MOLFLAoE 

In  the  first  experiment,  a  shape  discnmination 
task  IS  used  with  a  variety  of  displays  First,  m 
order  to  sensibly  compare  results  to  our  pre¬ 
vious  work  (Sperling  et  al ,  1989,  Dosher  et  al., 
1989b),  there  are  control  conditions  that  are 
identical  to  those  of  our  previous  experiments 
(the  "Motion  without  density  cue,  standard 
speed,  standard  intensity"  and  ".Motion  with 
polanty  alternation,  standard  speed,  standard 
intensity"  conditions  of  the  preceding  paper).  In 
addition  to  dots,  randomly  positioned  disks  and 
lines  are  also  used  here  in  Older  to  examine  the 
effects  of  the  foreground  token  used  to  carry  the 
motion  The  disk  and  line  tokens  are  larger  than 
the  single  pixel  dots,  and  hence  have  more 
contrast  energy  They  enable  us  to  test  whether 
our  previous  failure  to  find  KDE  with  polanty 


alternation  resulted  from  the  low  contrast 
energy  in  the  stimulus.  Two  forms  of  micro- 
balanced  stimuli  are  used,  allowing  us  to  test 
KDE  shape  identification  performance  with 
stimuli  to  which  Ist-order  motion  detectors  are 
blind.  Finally,  we  examine  stimuli  in  which 
moving  textured  tokens  are  camouflaged  by  a 
similarly  textured  background. 

Method 

Subjects.  There  were  three  subjects  in  this 
experiment.  One  was  an  author,  and  the  other 
two  were  graduate  students  naive  to  the  pur¬ 
poses  of  this  expenment.  All  had  normal  or 
correcled-to-normal  vision.  There  were  slight 
differences  in  the  conditions  for  each  of  the 
three  subjects.  These  will  be  pointed  out  below 

Whtte-on-gtay  dot  stimuli.  First,  we  briefly 
describe  the  stimuli  that  consist  of  bright  dots 
moving  on  a  gray  background  representing  a 
vanety  of  3D  shapes  This  descnption  will  be 
somewhat  abbreviated,  since  the  same  stimuli 
have  been  used  in  previous  studies  and  more 
complete  descriptions  are  available  (Sperling  et 
al .  1989)  The  other  stimuli  used  in  the  present 
study  result  from  simple  image  processing  trans¬ 
formations  applied  to  the  white-on-gray  dot 
stimuli. 

Stimuli  were  based  upon  a  fixed  vocabulary  of 
simple  shapes  consisting  of  bumps  and  concav¬ 
ities  on  a  flat  ground.  The  3D  shapes  vaned  in 
the  number,  position,  and  2D  extent  of  these 
bumps  and  concavities.  The  process  of  generat¬ 
ing  the  stimuli  is  illustrated  in  Fig  I 

The  first  step  in  creating  a  stimulus  involves 
the  specification  of  a  3D  surface  For  a  square 
area  with  sides  of  length  r,  a  circle  with  diameter 
C9j  IS  centered,  and  three  fixed  points,  labeled 
1. 2  and  3.  are  specified  For  a  given  shape,  one 
of  two  such  sets  of  points  is  used  (the  upward- 
pointing  triangle  or  the  downward-pointing  tn- 
angle,  labeled  u  and  d,  respectively)  The  shape 
IS  specified  as  having  a  depth  of  zero  outside  of 
the  circle  For  each  of  the  three  identified  points, 
the  depth  may  be  either  -t-O  5  j,  0  0,  or  -0  5  r, 
which  are  labeled  as  -b ,  0,  and  - ,  respectively 
The  depth  values  for  the  rest  of  the  figure  were 
interpolated  by  using  a  standaru  cubic  spline  to 
connect  the  three  intenor  points  with  the  zero 
depth  surround  Thus,  there  are  54  ways  to 
designate  a  shape  u  vs  d,  and  for  each  of  three 
interior  points,  t  vs  0  vs  - .  We  designate  a 
shape  by  denoting  the  triangle  used,  followed  by 
the  depth  designations  of  the  three  points  in  the 
order  shown  in  Fig  I A  For  example,  u  —  0 
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Fig  1  Stimulus  shapes  rotations,  and  their  designations  (A)  Shapes  were  constructed  b\  choosing  one 
of  the  two  equilateral  mangles  represented  here  Each  point  in  the  triangles  was  given  a  positive  depth 
(I  c  toward  the  observer),  zero  depth,  or  negative  depth,  represented  as  4 . 0  and  respectively  A 
smooth  shape  splined  these  three  points  to  zero  depth  values  outside  of  the  circle  A  shape  is  designated 
bv  the  choice  of  triangle  (u  or  d),  followed  by  the  depth  designations  of  the  three  points  m  the  order  given 
in  the  figure  (B)  Some  representative  shapes  generated  by  this  procedure  All  shapes  consisted  of  a  bump, 
concavitv  or  both,  with  a  vanaiion  in  position  and  extent  of  these  areas  (C)  Shapes  were  represented 
bv  a  set  of  dots  randomly  painted  on  the  surface  of  the  shape,  and  wiggled  about  a  vertical  axis  through 
the  center  of  the  display  The  motion  was  a  sinusoidal  rotation  that  moved  the  object  so  as  to  face  off 
to  the  observer’s  right  then  his  or  her  left,  then  back  to  faccTorward  (denoted  /},  or  the  reverse 
(denoted  r) 


IS  a  shape  with  a  bump  in  the  upper-middle  of 
the  displa>.  and  a  concavit)  in  the  lower-left 
(Fig  IB)  There  are  53  distinct  shapes,  because 
1/000  and  rfOOO  both  denote  a  flat  square 
Displays  were  generated  by  sprinkling  dots 
randomly  on  the  3D  surface  generated  by  the 
spline,  rotating  that  surface,  and  projecting  the 
resulting  dot  positions  onto  the  image  plane 
using  parallel  perspective  A  large  number  of 
dots  are  chosen  unifonniy  over  a  2D  area 
somewhat  larger  than  the  s  by  s  square,  and 
each  dot's  depth  is  determined  by  the  cubic 
spline  inierpolant  (where  the  zero  depth  of  the 


surround  is  continued  outside  the  square)  This 
collection  of  dots  is  rotated  about  a  vertical  axis 
that  IS  at  zero  depth  and  centered  in  the  display. 
The  rotation  angle 0(k)is  a  sinusoidal  "w.ggle" 
±25sm(2at/30)  deg,  where  k  is  the 
frame  number  within  the  30  frame  display 
Thus,  the  display  either  rotated  25  deg  to  the 
right,  then  reversed  its  direction  until  it  faced 
25  deg  to  the  left,  then  reversed  its  direction 
until  It  was  again  facing  forward  (labeled  /),  or 
rotated  in  the  opposite  manner  (labeled  r,  see 
Fig  1C)  The  displays  presented  these  3D 
collections  of  dots  in  parallel  perspective 
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as  luminous  dots  (single  pixels)  on  a  darker 
background 

A  stimulus  name  consists  of  the  name  of  the 
shape  followed  by  the  type  of  rotation  (e.g 
«  +  -00,  resulting  in  108  possible  names.  Using 
parallel  perspective,  there  is  a  fundamental 
ambiguity  with  the  KDE:  reversing  the  depth 
values  and  rotation  direction  of  a  particular 
shape  and  rotatton  produces  exactly  the  same 
display  In  other  words,  a  convexity  rotating  to 
the  nght  produces  exactly  the  same  set  of  2D 
dot  motions  as  a  concavity  rotating  to  the  left. 
Thus.  «+— 0/  and  u  — +0r  describe  precisely 
the  same  display  type.  There  is  also  no  differ¬ 
ence  in  display  type  among  i/OOO/,  uOOOr,  dOOO/ 
and  rfOOOr.  This  results  in  a  total  of  53  distinct 
display  types. 

These  experiments  used  54  white-on-gray  dot 
displays,  including  two  instantiations  of  the  flat 
stimulus  aOOO  (with  different  dot  placements) 
and  one  instantiation  of  each  other  display  type. 
Each  set  of  dots  was  windowed  to  a  display  area 
of  182  X  182  pixels  (corresponding  to  the  s  x  s 
square),  with  dots  presented  as  single  luminous 
pixels. 

When  the  dots  on  the  surface  of  a  shape  move 
back  and  forth  in  the  display,  the  local  dot 
density  changes  as  the  steepness  of  the  hills  and 
valleys  changes  (with  respect  to  the  line  of 
sight)  In  previous  work  (Sperling  ct  al..  1989), 
we  showed  that  this  density  cue  is  neither 
necessary  nor  sufficient  for  the  perception  of 
depth  However,  it  is  a  weak  cue  which  one  of 
three  highly  trained  subjects  was  able  to  use  for 
modest  above-chance  performance  when  n  was 
presented  in  isolation  In  other  words,  changing 
dot  density  is  an  artifaciual  cue  to  the  task  As 
in  previous  experiments,  we  remove  this  cue  by 
deleting  or  adding  dots  as  needed  throughout 
the  display  in  order  to  keep  local  dot  density 
constant.  As  a  result  of  this  manipulation,  all 
displays  had  approx  300  dots  visible  in  the 
display  window  The  removal  of  the  density  cue 


results  in  a  small  amount  of  dot  scintillation 
that  neither  lowers  performance  substantially 
nor  appears  to  be  useful  as  an  artifactual  cue 
(Sperling  et  al.,  1989,  1990). 

Other  tokens.  The  54  stimuli  desenbed  so  far 
consisted  of  luminous  dots  moving  to  and  fro  on 
a  less  luminous  background.  All  other  stimuli 
were  based  upon  these  displays.  First,  three 
conditions  involved  changes  of  the  token  that 
earned  the  motion.  The  moving  dots  were  re¬ 
placed  with  disks,  patterned  dtsks,  or  wires.  We 
refer  to  the  dot,  wire,  and  disk  conditions  as 
while-on-gray  stimuli,  and  the  patterned  disks 
as  pattern-on-gray. 

To  create  a  disk  stimulus,  a  dot  stimulus  is 
modified  in  the  following  way.  Each  luminous 
dot  in  the  stimulus  is  replaced  with  a  6  x  6  pixel 
luminous  diamond  centered  on  the  dot 
(Fig.  2b).  which  appears  disk-like  from  the 
viewing  distance  used  in  the  expenment  A 
sample  image  of  white-on-gray  disks  is  depicted 
in  Fig  2c,  and  is  based  on  the  whitc-on-gray  dot 
stimulus  frame  shown  in  Fig.  23. 

The  pattern-on-gray  disk  stimuli  are  gener¬ 
ated  in  a  similar  fashion.  The  6x6  diamond 
consists  of  24  pixels  which  are  a  mixture  of 
black  and  white  (12  of  each)  These  are  dis¬ 
played  on  an  intermediate  gray  background 
The  diamond  pattern  and  a  sample  stimulus 
frame  are  shown  in  Fig  2d  and  e.  respectively 
Note  that  the  diamond  pattern  has  an  equal 
number  of  black  and  white  pixels  in  each  row 

Other  stimuli  were  based  on  “wires"  Each 
dot  was  connected  by  a  straight  line  (subject  to 
the  pixel  sampling  density)  to  all  neighbors  that 
were  at  a  2D  distance  no  greater  than  1 5  5  pixels 
(Fig  2f).  Note  that  a  vector  is  drawn  between 
two  points  based  on  their  distance  in  the  image. 
not  on  their  simulated  3D  distance.  Since  the 
lines  were  straight,  when  set  in  motion  they 
objectively  define  a  thickened  surface  with  lines 
cutting  through  the  intenor  of  each  bump  and 
concavity  This  may  have  yielded  a  perceived 


Fig  C  topposne)  Sltmulus  d'splay  generation  for  capl  1  (al  A  single  frame  of  a  while-on.gray  dots 
siimulus  All  displays  shown  in  this  figure  are  based  on  this  stimulus  frame  (b)  The  diamond  shape  used 
to  generate  the  disks  from  the  dots  (c)  A  white-on-gray  disks  stimulus  frame  (d)  The  patterned  diamond 
for  the  pattern-  in-gray  eondioon  (e)  A  pattern  on  gray  frame  (f)  A  white-on-gray  wires  frame  All  pairs 
of  dots  in  Fig  2A  were  connected  whose  inter-point  distance  was  less  than  15  5  pixels  (g|  A  frame  of 
dynamic-on-gray  dots  In  this  condition  each  dot  was  painted  black  or  white  randomly  and  independently 
with  probability  of  0  5  for  each  color  (h)  A  frame  of  dynamic-on-gray  disks  The  same  procedure  as  in 
(g)  was  applied  to  each  pixel  lying  in  each  disk  (i)  A  frame  of  dynamic  on-gray  wires  (j)  A  frame  of 
dynamic  on  static  disks  For  both  dynamic-on-static  conditions  (disks  and  wires),  the  tokens  and  the 
background  consisted  of  random  dot  noise  and  so  the  tokens  cannot  be  discerned  from  a  single  static 
frame  ikl  A  frame  of  the  pattern  on-siatic  condition  This  frame  contains  300  copies  of  ihe  pattern  in 
id)  on  a  static  noise  background  The  camoullage  is  quite  effective  (1)  An  enlargement  of  the  central 
portion  of  (k)  with  the  patterned  disks  emphasized 
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(tesselated)  surface  having  slightly  less  relative 
depth  than  the  base  surface.  The  choice  of  15.5 
pixels  as  the  criterion  for  drawing  a  line  was  a 
compromise  set  in  order  to  make  sure  that  all 
stimulus  dots  became  an  endpoint  to  at  least 
one  line,  and  that  no  line  was  so  long  as  to 
excessively  cut  through  the  simulated  surface. 

The  white-on-gray  disks  and  pattem-on-gray 
disks  were  based  on  the  dot  stimuli.  The  same 
exact  instantiations  were  used  in  all  three  con¬ 
ditions,  The  nth  frame  of  a  given  shape  and 
rotation  consisted  of  either  dots,  disks  or  pat¬ 
terned  disks  centered  on  the  same  set  of  image 
positions  For  the  wire  stimuli,  a  new  set  of  54 
instantiations  was  made. 

Dynamic-on-gray,  Three  types  of  stimuli 
were  used  to  explore  the  motion  of  patches  of 
dynamic  noise  moving  on  a  gray  background. 
These  stimuli  are  microbalanced.  as  we  dis¬ 
cussed  in  the  previous  section,  These  stimuli  are 
derived  from  the  dot.  disk,  and  wire  stimuli  To 
produce  a  dynamic-on-gray  stimulus  from  a 
white-on-gray  stimulus,  simply  change  the  lumi¬ 
nance  of  each  white  pixel  in  each  stimulus  frame 
(i.e  the  foreground  or  token  pixels)  to  black 
randomly  and  independently  with  probability 
0  5  Thus  foreground  pixels  undergo  random 
contrast  polarity  alternation  while  background 
pixels  arc  gray  (i.e  have  aero  contrast)  Sample 
frames  are  illustrated  in  Fig.  2g.  h  and  i 

Dynamic -on-stauc  Two  types  of  stimuli  were 
used  to  explore  the  motion  of  patches  of 
dvnamic  noise  moving  on  a  static  noise  back¬ 
ground  This  class  of  stimuli  is  also  micro- 
balanced  (Chubb  &  Sperling,  1988b)  We  derive 
dynamic-on-siatic  stimuli  from  the  disk  and 
wire  stimuli  The  foreground  pixels  consist  of 
dynamic  noise,  just  as  in  the  previous  dynamic- 
on-grav  case  The  background  pixels  consist  of 
a  staiic  frame  of  patterned  texture,  where  each 
pixel  IS  randomly  chosen  to  be  either  black  or 
wh.te  with  a  probability  of  0.5.  just  as  the 
dynamic  noise  is  If  a  given  pixel  is  a  back¬ 
ground  position  for  two  successive  frames, 
then  ns  color  does  not  '■hange  If  that  position 
IS  a  foreground  pixel  i.i  either  or  both  frames, 
then  there  is  a  50%  chance  that  ns  color  will 
change  A  single  frame  of  dynamic-on-static 
stimulus  IS  simply  a  frame  of  random  dot  noise 
(Fig  3j)  The  motion-carrying  tokens  are  not 
discernible  from  a  single  frame  Rather,  the 
areas  of  moving  dynamic  noise  define  the 
foreground  tokens 

Contrast  polarity  alteration  Three  stimulus 
conditions  involved  contrast  polarity  alterna¬ 


tion  This  stimulus  manipulation  was  explored 
thoroughly  for  dot  stimuli  in  the  preceding 
paper  (Dosher  et  ak,  1989b)  In  this  condition, 
the  motion-carrying  tokens  alternate  from  white 
to  black  to  white  again  on  successive  frames,  all 
against  a  background  of  intermediate  gray. 
Constrast  polanty  alternation  was  used  with 
dots,  disks,  and  wires,  resulting  in  three  polarity 
alternation  conditions. 

Pauem-on-static.  The  final  condition  in¬ 
volves  pattern  camouflage.  This  condition  is 
derived  from  the  pattem-on-gray  stimuli.  The 
gray  background  is  replaced  with  a  frame  of 
static  random  dot  noise.  In  other  words,  the 
patterned  disk  tokens  move  to  and  fro  in  front 
of  a  screen  of  static  random  dots,  occluding  it 
(and  occasionally  each  other)  as  they  pass  by.  A 
frame  of  this  stimulus  condition  is  pictured  in 
Fig  2k,  and  enlarged  in  Fig  21.  where  we  have 
artificially  highlighted  the  patterned  disks  for 
comparison  to  the  pattern  kernel  shown  in  Fig. 
2d.  There  are  approx.  300  patterned  disks  in 
Fig.  2k.  As  you  can  see.  the  camouflage  is  quite 
effective.  When  the  patterned  disks  move,  as  one 
might  expect,  they  are  easily  visible  (Julesz, 
1971). 

Display  details.  There  are  a  total  of  13  con¬ 
ditions  (3  white-on-gray.  I  pattem-on-gray,  3 
contrast  polanty  alternation.  3  dynamic-on- 
gray.  2  dynamic-on-static.  and  I  patiern-on- 
static)  There  were  54  distinct  displays  for  each 
of  the  13  conditions  In  all  conditions,  the 
displays  are  windowed  to  an  area  of  182  x  182 
pixels  Displays  were  computed  using  the  HIPS 
image  processing  software  (Landy.  Cohen  & 
Sperling.  1984a.  b).  and  displayed  by  an  Adage 
RDS-3000  image  display  system. 

Subjects  MSL  and  JBL  viewed  these  stimuli 
ona  Conrac  721 ICI9  RGB  color  monitor  Only 
the  green  gun  was  used,  and  so  stimuli  appeared 
as  bnght  green  and  black  pixels  (as  dots,  disks, 
lines  or  noise)  on  a  green  background  of  inter¬ 
mediate  luminance.  The  stimuli  subtended 
3  7  x42  deg  Stimuli  were  viewed  monocularly 
through  a  dark  viewing  tunnel,  using  a  circular 
apertue  which  was  slightly  larger  than  the 
stimuli 

Subject  UJ  viewed  the  stimuli  on  a  US 
Pixel  PXI5  black  and  white  monitor  with 
a  P4-like  phosphor  Here,  stimuli  subtended 
2  9x29  deg,  and  appeared  as  white  and  black 
pixels  on  an  intermediate  gray  background 
Stimuli  were  viewed  monocularly  through  a 
circular  aperture  in  cardboard  which  approxi¬ 
mately  matched  the  hue  of  the  displays,  a-'d 
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which  had  approximately  the  same  luminance  as 
the  stimulus  background. 

Each  stimulus  consisted  of  30  stimulus 
frames.  These  were  presented  at  a  60  Hz  frame 
rate.  Each  frame  was  repeated  four  times,  result¬ 
ing  in  an  effective  rate  of  1 5  new  stimulus  frames 
per  second.  Each  stimulus  lasted  2  sec.  A  trial 
sequence  consisted  of  a  fixation  spot,  a  blank 
interval,  the  30  frame  stimulus,  and  a  blank.  The 
fixation  and  blank  lasted  either  for  I  sec  each 
(subjects  MSL  and  JBL),  or  0.3  sec  each  (subject 
LIJ).  The  background  luminance  remained  con¬ 
stant  throughout  the  trial  sequence.  Subjects 
were  free  to  use  eye  movements  to  actively 
explore  the  display.  Stimuli  were  viewed  from  a 
distance  of  1.6  m.  After  each  stimulus  display, 
subjects  responded  with  the  name  of  the  shape 
and  rotation  direction  using  either  a  computer 
keyboard  or  response  buttons 

Slightly  different  image  luminances  were  used 
for  each  subject.  The  background  luminance  for 
subjects  MSL.  JBL  and  UJ  were  31.0, 40.0  and 
45  0  cd/m"  respectively  Since  isolated  luminous 
puels  were  used,  the  appropriate  unit  of 
measurement  is  extra  jicd/pixel  for  bnght 
pixels,  and  reKicvetl ;icd/pixcl  for  dark  pixels,  all 
at  a  specified  viewing  distance  (Sperling,  1971) 
Stimuli  were  calibrated  so  that  extra  jicd/pixel 
and  removed  jicd/pixel  were  equal.  For  subjects 
MSL,  JBL  and  LJJ,  these  were  13  2,  19  2 
and  15  7(icd/pixel.  respectively,  at  a  viewing 
distance  of  1  6  m  Contrasts  were  nominally 

100«'o 

Procedure  There  were  13  stimulus  conditions. 
For  each  condition,  there  were  54  stimuli  (two 
instantiations  of  the  flat  stimulus  uOOO,  and  one 
instantiation  of  each  of  the  52  other  possible 
distinct  shape,  rotation  combinations)  This  re¬ 
sulted  in  702  stimuli,  each  of  which  was  viewed 
once  by  each  subject.  These  702  trials  were 
viewed  in  random  order  in  six  blocks  of  117 
tnals  On  a  given  trial,  a  stimulus  was  shown, 
subjects  keyed  in  their  responses,  and  then 
feedback  was  provided  so  that  we  measured 
the  best  performance  of  which  the  subject 
was  capable.  Each  block  lasted  approx  1  hr. 
Subjects  ran  several  practice  sessions  on  the 
white-on-gray  dots  condition  before  data 
were  collected  Given  the  mix  of  stimuli  in 
a  given  condition,  guessing  base  rates  for 
the  identification  of  shape  and  rotation  direc¬ 
tion  were  between  1,53  (foi  a  strategy  of 
random  guessing)  and  2, '54  (for  a  strategy 
of  always  answering  uOOOt  or  one  of  its 
equivalents) 
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Fig  3  Rcjuitsofcxpi  I  R«uli$  arc  given  for  three  subject 
DifTerem  svmbols  in  the  bars  represent  different  tokens 
(large  open  dots  for  the  disk  and  patterned  disk  tokens, 
small  solid  dots  for  the  dot  tokens,  and  astensks  for  the  wire 
tokens) 


Results 

The  results  for  the  three  subjects  are  summar¬ 
ized  m  Fig  3  Each  performance  measure  given 
here  IS  the  percent  correct  over  54  trials  We 
discuss  each  class  of  stimulus  condition  in  turn 

IVhue-ort'grav I Paitern-on^gray.  As  ex¬ 
pected.  the  performance  on  the  three  whiie-on- 
gray  and  the  one  pattem-on-gray  condition  v-’as 
uniformly  high  The  tokens  provided  excellent 
motion  signals  because  they  were  moving  ngid 
areas  of  high  contrast  It  did  not  particularly 
matter  whether  we  used  dots,  as  in  our  previous 
studies,  wires,  as  in  the  early  wire*frame  K.DE 
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uork  OVaQzdi  &  O'CbsissIl  I953J*  dzsl^^  or 
pattCTKd  disks.  Tk  disk  aad  pztKnjid  disk 
siisiBH  providsd  »■£!>•  Sffoaj  paapa  of  ^2p:, 
allSoueh  tb;  disks  did  no!  codirgo  rslisti: 
foresho.imins  as  ibs}'  roaad.  In  faa,  ite  do! 
stimuli  gare  tbs  urakes!  psicep.  of  dsp4.  That 
tokens  bad  tbs  least  contrast  energy  (Le.  «iae 
the  sntallest).  and  hence  ners  harder  to  dstecL 
Subject  JBL  bad  the  greatest  di&ulty  in  seeing 
these  small  dots,  and  bis  resuhs  sbo»'  a  slight 
drop  in  perfonnancs  for  the  do:  stimuli. 

Dpximk-on-gray.  The  motion  of  a  toi 
Slled  Hath  dynamic  random  do:  noise  aorine 
on  a  gray  background  is  nu'crobalanced.  In 
other  words.  Ist-order  motion  dsuciors  arc 
“blind"  to  ihit  stimulus.  The  expected  talus  of 
the  output  of  such  a  detector  is  zero  (across 
random  realizations  of  the  stimulus).  Simple 
2nd-order  mechanisms  (e.g.  using  rectificaaon) 
se.te  to  reseal  the  true  motion. 

Tne  results  for  three  subjects  are  someuha: 
different  For  two  subjects  (UJ  and  JBL), 
performana  is  always  at  or  near  chance  Ocss 
than  IO“/»  correct  in  all  cases),  although  for 
subjeci  UJ  with  the  dsnamic-on-graj  dots  the 
performance  is  signiiicantl}  abose  chance 
(F  <  0.05)  On  the  other  hand,  for  subject  .MSL. 
performance  is  alw-ays  well  aboie  chance 


'In  ofCrr  lo  lest  ihe  rir.ft  ofltixininces  ost;  u.iiith  po'.an;> 
alieralton  was  ettectoe  u;  rar  a  conirol  exprnxrr.: 
(usinr  MSL  and  JBL  as  fjb/xtft  where  a  sanew  of 
while  pisej  luxinances  were  used  with  a  psen  btaet  pixel 
luxinanee  We  slewed  a  lanels  of  dsT.amCKi.n'Sras 
disptass  sarsinp  the  fuxtnance  satues  foe  me  hlaek  and 
while  pisels  independemis  oser  a  wide  ranpe  V.e  also 
tested  a  lanels  of  OLher  luntieance  calihraiion  pro* 
cedures  DsnaxiC'On-prai  stimuli  are  oni.  msero-hai- 
aneed  if  the  comrasi  energs  of  the  while  pixejs  iS  the 
same  as  that  of  the  htacL  pixels  And.  it  is  ditheull  to 
calibrate  Ihe  luniinanee  of  indtsidua!  pixels  embedded  in 
a  complex  displas  texture  pst'  that  tne  desired  patlem 
IS  hrsi  low.pass  htlered  b>  Ihe  CRT  sideo  ampiihee.  and 
then  passe'  thmugh  Uie  gun  ncmlineanls  (see  M.iUisar. 
A  Stone.  19S9.  for  a  full  discussion  of  Ih.s  poinll  Thus. 
It  was  important  to  serib  tbai  our  results  were  rohusi 
oser  a  ranre  of  luminance  satues  osrrlappinr  the  call- 
braicd  equal  contrast  point 
To  summanze,  shape  idenlilicalion  perfoimanec  is 
consistent  with  the  results  of  expi  I  for  a  reasonabi)  ss-kIc 
range  of  while  pixel  luminances  Subject  MSL  consist- 
emt)  performs  at  moderate  lesels.  ind  subject  JBL 
consislentls  perfonns  at  or  near  chance  The  ium-nance 
lesels  sselding  poor  shape  idenlilicalion  pcrfoimancc  are 
sonsistenl  with  the  lesels  ihil  result  in  the  weaxesl  3D 
pe.'cepl  and  are  reughlj  consistent  with  the  tummanee 
levels  that  are  balanced  (blacL  pixel  decrement  ss  white 
pixel  increment)  for  a  sariel)  of  caiibration  displajs  The 
pe'Iormar,ce  lest.s  for  djm-mic-on  graj  stimuli  in  exp; 
I  do  not  rcru.t  from  a  miscahbr«,  in  of  luminance  lesels 


(Z^39%  cocTccs  i*mritetSo=tX  bm  Stc  less 
thza  ias  oaciv  pccfcc:  (W-9S5i  coma)  per- 
fonszasc  with  white  or  pmtsa  tokcas  oa  gmr.* 

The  Iss-onicxzsutiosmcchzt^mszrcdczcir 
the  csosi  eSccth’c  i:^  ta  the  KDE  Qtass. 
gpccdrshatcgiaotiaadgccxzbfeiy  Ist-ardc; 
lucdizmsms  rcducs  peribruanee  sebsaniizlly 
for  zB  subjects.  The  resuhs  for  ssi^cc:  MSL 
suggest  Ihzi  2B(a-ordcr  motioa  mcrfiznisas  cza 
ziso  be  used.  Oa  souse  tnzls.  frzgmcsts  of  the 
uticrobzizaced  stimuS  did  zppezr  3D  to  this 
subject  (one  of  the  zetbom).  especizBy  in  the 
fovezDy-viewed  portion  of  the  stimuJus.  To  raise 
his  pmfonuzoce  leveL  be  used  sophisticated 
guessing  ssziegies  based  on  zetiwe  eye  moxe- 
ments  and  locz)  measurements  f  motion  or 
three-dimensionziity  in  the  fox*,  z!  a  small 
number  of  locations  of  the  aisplay.  Bul  these 
s'dztcgies  only  serve  to  bang  petfonnance  up  to 
mediocre  levels  in  companion  with  pcrfonnancc 
with  rigid  whitc-on-gray  mouon. 

Dpumtc-on-suuic.  The  dysumic-on-siaiic 
mantpulation  also  results  in  a  micro-balanced 
stimulus.  For  the  dynamic-on-static  conditions, 
pcrfonnancc  is  at  chance  level  for  all  three 
subjects,  and  for  both  wire  disk  tokens.  As  with 
the  dynamioon-gray  conditions,  the  motion  of 
the  tokens  is  visible.  It  is  not  particularly 
difhculi  to  detect  the  motion  of  an  area  of 
dynamic  noise  on  a  static  notse  background 
(Chubb  &.  Sperbng.  1988b).  However,  this  son 
of  motion  engenders  no  shape  percept  whatever 
under  the  conditions  of  our  cxpenmenis. 

Linlitie  dynamic-on-gray  stimuli,  dynamic- 
OR-siaiic  stimuli  are  not  revealed  by  contrast 
rectification  Detection  of  Ihe  motion  of  a  re¬ 
gion  of  fiickcr  requires  more  elaborate  2nd- 
order  mechanisms.  Rtpons  of  fiiclter  could  first 
be  delected  by  applying  a  linear  temporal  filler 
(such  as  difTerenliaiion).  followed  by  rcciifi- 
caiion.  and  then  by  application  of  a  Ist-order 
motion  mechanitm  Some  such  complex  2nd- 
order  motion  deiccior  exists  in  Ihe  human  visual 
system,  since  we  are  capable  of  seeing  areas  of 
flicker  move,  including  in  the  displays  of  our 
espenment  (at  least  with  scrutiny)  Yet.  this 
2nd-order  motion  detection  system  does  not 
support  the  siructure-from-moiion  computation 
for  our  dynamic-on-staiic  stimuli 

Prazdny  (1986)  reached  the  opposite  con¬ 
clusion  using  dynamic-on-staiic  displays  repre¬ 
senting  simple  wire  objects  rotating  in  a 
tumbling  motion  Each  object  contained  five 
wires,  and  subjects  were  required  lo  identify  the 
objecl  among  six  alternative  wire-frame  objects 
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pods  UadL  Fofonszacs  a:2S  (jists  hijh 
ia  ts£  task  for  fiv:  ssl^Kts.  P&isaa^  vz  have 
50=:  rcssvxiioas  zbost  ib:  expaisxs'aS 
CKtbod  sSipioyci  by  Fzzzday.  m  {evs  gsssr- 
2i£d  ssshr  disj&ys  in  oar  i^borziory.  2nd  oar 
dysanac-on^Dc  aire-frasis  di^bys  do  yidd 
2  sbzpe  pcrrsp:  vba  dispbys  zr:  rcaiard  to 
a  snail  numbs'  of  nires. 

Tb:  sx»t  Eb^  expiznation  of  tb:  diSscncs 
bctvczn  our  msiits  and  those  of  Prazdny  in- 
\cH-es  the  dinscccn  in  spatial  resolution  re¬ 
quired  1^  each  lasb.  Chubb  and  Sperling 
(I9SSa)  base  demonstrated  that  2n^ords 
motion  syst-nns  base  kss  ^tial  resolution  than 
the  Is-order  sschanisms,  and  that  their  resol¬ 
ution  drops  predpitously  with  increases  in  reu- 
nal  eccentricity.  In  our  displays,  motion  was 
about  a  sertical  axis  using  parallel  perspcctise. 
and  hence  all  motion  uas  along  the  honzonial. 
There  could  be  as  many  as  10  or  20  disks  or 
aires  in  a  gisen  rou  of  the  image  to  resob  e.  Our 
displays  did  not  yield  a  global  percept  of  optic 
don.  but  motion  nas  perceised  foveally  nith 
scrutiny.  This  is  entirely  consistent  with  Chubb 
and  Sperling's  obsersation.  Praadny  did  not 
gise  precise  details  about  his  stimuli,  but  it  was 
clear  that  along  a  giien  motion  path  there  were 
only  two  or  three  wires  to  resohe  across  his  far 
larger  display  Performance  was  so  low  in  our 
dynamic-on-staiic  conditions  because  too  much 
spatial  acuity  was  requireo  of  the  2nd-ordcr 
system  that  detects  the  motion  of  dickering 
regions. 

How  useful  for  perception  of  shape  is  a 
display  of  dynamic  noise  figures  monng  on  a 
siatic  noise  background?  We  base  examined  a 
large  number  of  disk  and  (thick)  wire  displays 
in  order  to  span  the  gap  of  spatial  resolution 
'oetween  Prardny's  displays  and  our  own.  With 
our  3x3  deg  display  size,  a  shape  percept  can 
only  be  achiesed  by  using  a  sery  small  number 
of  tokens  (around  5-10).  These  displays  con¬ 
sisted  of  rotating  disk  tokens  Cavanagh  and 
Ramachandran  (19S8)  suggest  an  alternative 
explanation  of  the  difTcrcnce  between  our  results 
and  those  of  Prazdny.  They  consider  the  crucial 
diderence  to  be  that  the  objects  portrayed  in  the 
Prazdny  displays  were  connected  (one  long  wire 
figure),  whereas  our  displays  consisted  of  separ¬ 
ate  disk  tokens.  With  our  wire  displays,  almost 
no  3D  percept  was  achieved  for  the  dynamic-on- 
staiic  condition.  In  addition,  we  were  able  to 
achieve  a  3D  percept  with  displays  of  a  small 
number  of  dynamic-on-staiic  disks  Thus,  we 


fed  that  low  spatial  resolutioa  ia  the  29d- 
ordax  motion  system  (ratiiex  than  imcoasected 
tokens)  ts  the  Ekely  explasztion  for  failure  of 
KDE 

Cocircss  potsiiK  ■dtmetioa.  Performance  is 
qmte  poor  for  the  contras:  pdzrity-altemating 
dots  as  it  was  in  the  previous  paper  (Dosher  et 
ah.  1939b).  For  twi>  subjects  (JBL  and  UJ) 
performance  is  at  chance  or  insignificantly 
above  chance.  For  subject  MSL.  performance  is 
low  (11%  correct)  but  significantly  above 
chance  (P  <  Oi)S).  On  the  other  hand,  when  the 
token  is  changed  to  disks  or  wires,  performance 
rises  subsmntially.  Contrast  polarity  alternation 
is  not  as  devasmting  a  stimulus  manipulation 
for  disks  and  wires  as  it  is  for  dots. 

For  Ist-order  motion  detection  mechanisms 
such  as  the  Reidiardt  detector,  contrast  polarity 
alicmation  causes  the  strongest  responses  to  be 
in  the  wrong  direction.  Yeu  the  intended  motion 
can  be  detmted  quite  accurately  if  a  2nd-ordef 
detector  is  used  that  first  applies  a  luminance 
nonlineanty  followed  by  a  Reichardi  detector. 
'The  pnmaty  difiercnce  between  the  dots  on  the 
one  hand,  and  the  disks  and  wires  on  the  other, 
is  that  the  disks  and  wires  have  more  pixels 
illuminated.  In  other  words,  they  have  more 
contrast  energy,  and  in  particular  thay  have 
more  energy  at  lower  spatial  frequer.cies.  Thus, 
the  disk  and  wire  stimuli  should  stimulate  both 
the  1st-  and  2nd-order  motion  detection  systems 
more  strongly,  resulting  in  stronger  incorrect 
direction  infoimation  Irom  the  Ist-ordcr 
system  as  a  whole,  but  also  stronger  information 
from  the  2nd-order  system,  and  stronger 
directional  information  in  those  selected  Ist- 
order  freque.ncy  bands  which  signal  the  correct 
direction 

It  IS  interesting  to  note  that  a  large  number  of 
the  errors  made  by  observers  wnh  polanly-alier- 
nating  stimuli  were  etrois  m  the  direction  of 
rotation  only,  with  ihc  shape  specified  correctly 
For  example,  for  a  stimulus  which  had  as 
correct  answers  either  u  —  —  0/  or  u  —  -r  Or,  the 
subject  incorrectly  responded  with  ii  -i-  -  Or  or 
U-  +  0I,  rather  than  with  any  of  the  104  other 
possible  incorrect  responses.  This  eflcct  was 
largest  for  the  disk  tokens.  In  a  separate  control 
experiment,  for  contrast  polanty-allernating 
disk  stimuli,  39%  of  the  errors  made  by  subject 
MSL  were  only  an  error  m  the  specification  of 
direction,  compared  to  1.4%  direction  errors 
for  the  dynamic-on-gray  conditions  For  subject 
JBL,  the  corresponding  values  were  48%  and 
5  6%  For  the  polanty-allernating  disks,  on 
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triab  viaa  sabjKt  MSL  correcsly  jsfcjnied  tbt 
sSapt.  ibsre  ra  z  33%  dans  ihai  fe  wodd 
ni^dssaiy  ibc  Srcaioa  of  roados  (for  JBL: 
K jil).  Vi's  bdis»z  ifczi  zaaaztt  ships  idsaUS- 
czuon  in  this  condiuon  pnctanh-  rsSscts  rt- 
spossts  constnatsd  from  ssissisd  Isi-ordsr 
infomalion.  Oas  siizi^'  was  stmi^y  to  ^)cdfy 
tbs  opposits  roation  dhssdoa  to  that  ttindi 
was  pcicsis'ed!  Ths  displays  did,  bmtxa,  oe- 
casicoaHy  appear  to  te  3D  «ith  ths  correa 
direction  of  motion  (at  certain  times  during  the 
roation.  or  dose  to  the  location  to  which  tbs 
eyes  were  directed),  indicating  a  residua]  22^- 
order  motion  input  to  the  KOE  systsa.  The  faa 
that  these  displays  only  appeared  fot-ealty  to  be 
roaiing  in  the  correct  direction,  and  then  only 
using  the  larger  tokens,  is  conastent  with  a 
2nd-order  motion  detscuon  system  with  low- 
contrast  sensitivity  and  low  spatial  resolution 
(as  has  been  demonstrated  by  Chubb  & 
Sperling.  1988b).  and  more  sensitise  in  the  fosea 
(Chubb  &  Sperling.  1988a).  In  summary,  we 
hase  some  indication  that  2nd-order  motion 
detection  mechanisms  can  be  used  to  derise  3D 
structure,  but  they  are  far  less  robust  and  has-c 
poorer  spatial  resolution  Shan  Isi-order  motion 
mechanisms 

Paiurn-on-sianc.  For  all  three  subjects  per¬ 
formance  with  paitern-on-static  displays  is  quite 
poor  (9.  26  and  33%  cortreci).  although  it  is 
significanils  abose  chance  lesels  in  all  cases 
(F  <  0  05).  This  poor  performance  results  from 
a  mismatch  of  resolution  and  temporal 
sampling  The  patiemed  disks  are  quite  de¬ 
tailed-high  frequency.  The  disks  are  6  puels  in 
diameter,  and  can  move  as  far  as  8.3  pixels  in 
one  frame  This  speed  is  only  achies-ed  by  disks 
ai  the  top  of  a  peak  when  in  the  middle  of  the 
display  ii.e  near  frame  numbers  0.  15  and  29). 
but  many  disks  arc  monng  3-5  pixels  per  frame 
High  frequency  spatial  fliers  which  are  required 
to  identify  the  disks  must  correlate  across 
frames  wnh  fillers  that  are  far  more  than  90  deg 
away  in  the  phase  of  their  peak  spatial  fre¬ 
quency  A  typical  Ist-order  detector  will  nol 
compare  spatial  regions  that  far  apart  in  order 
to  avoid  spatio-temporal  aliasing  (van  Sanicn  & 
Sperling.  1984)  Thus,  the  clearest  motion  sig¬ 
nals  are  coming  from  the  slower  areas  in  the 
display,  which  arc  the  least  useful  for  discnmi- 
nacing  the  shapes  We  have  examined  patlern- 
on-siatic  displays  with  finer  temporal  sampling 
(60  new  frames  per  sec,  as  opposed  to  4  repaints 
of  15  new  frames  per  sec  used  in  ihe  exper¬ 
iment).  and  Ihey  give  a  strong  impression  of 


ni 

three-dimerrsionaliiy.  Thus;  poor  performance 
in  the  task  resulted  from  undersamj&sg  in  ume 
of  the  stimuli,  which  interferes  with  Ist-order 
(and  some  2nd-order)  c-</<ion  merhantsms,  and 
good  KDE  can  result  from  the  motion  of  tokens 
which  are  camoufiaged  w-hen  at  rest. 

We  has-e  also  examined  dy-namibon-static 
displays  with  finer  temporal  sampling  (60  new- 
frames  per  sec).  These  displays  yield  no  im- 
presaon  of  three-dimensionabty-.  The  poor  re¬ 
sults  for  dynnmic-on-static  displays  do  not 
result  from  insufiicient  sampling  in  time.  Also, 
since  findly  sampled  pattem-on-statie  displays 
do  appear  3D.  poor  performance  with  dynamic- 
on-staiic-di^fiys  does  not  result  from  the 
camouflage  of  the  tokens  w-hen  at  rest.  Rat.her. 
dy-namic-on-srauc  displays  yield  no  efTccuve 
KDE  because  of  the  low  resolution  of  the 
2nd-c^er  system  required  to  analy-ze  the 
motion. 

EXPERIMVNT  I.  TWO.ntA.ME  ItDE 

The  first  experiment  shows  that  accurate  per¬ 
formance  in  shape  identification  is  dependeni 
upon  a  global  (primarily  Isi-order)  opiic  flow.  If 
a  stimulus  manipulation  makes  that  optic  flow 
noisy  or  otherwis  rferes  with  the  optic  flow 
computation,  thea  .>  little  or  no  KDE.  This 
occurs  even  though  fos-eal  scrutiny  does  reveal 
the  motion  in  these  displays. 

If  the  percept  of  surface  shape  dejrends  upon 
a  global  optic  flow,  then  we  should  be  able  to 
get  reasonable  shape  ideniification  performance 
from  any  stimulus  that  results  in  a  strong  )ier- 
cept  of  optic  flow  In  particular,  the  extended 
(2  sec)  viewing  conditions  of  expt  1  should  not 
be  necessary.  Two  frames  are  obnously  the 
minimum  number  of  frames  that  can  yield  a 
percept  of  motion,  and  two  frames  should 
suffice.  In  the  second  expenmcnl.  we  invcsligale 
the  accuracy-  of  performance  in  Ihe  shape 
identification  task  for  two-frame  displays. 

Method 

Subjects  There  were  iwo  subjects  in  this 
experiment.  One  was  an  author,  and  the  other 
was  a  graduate  student  naive  to  the  purposes  of 
this  ex|>enmcnt.  Both  had  normal  or  correcled- 
to-normal  vision.  There  were  slight  differences 
in  the  conditions  for  each  of  Ihe  two  subjects 
These  will  be  pointed  out  below. 

Stimuli  and  apparatus  The  siimuli  were  simi¬ 
lar  to  the  whiie-on-gray  dot  stimuli  from  expt  I 
Stimuli  were  generated  from  the  same  set  of  3D 
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shapes,  uang  ifae  same  dot  dotsiies,  and  pro- 
jKted  in  the  same  way.  The  local  dot  density 
was  kept  constant  using  the  sanse  scintillation 
ptocedure.  New  stimuli  were  computed,  two  of 
the  Sat  shape,  and  one  of  each  of  the  other  52 
shapes,  resulting  in  54  dispbys. 

Eadi  dispby  conasted  of  1 1  frames,  routing 
from  20  deg  kit  to  20  deg  right  m  increments  of 
4  deg  per  frame.  The  middle  frame  (number  6) 
was  face-forward,  as  was  the  Srst  frame  of  each 
dispby  in  expt  1.  Two-frame  stimuH  consisted 
of  a  presenbtion  of  the  middle  frame  followed 
by  one  of  Ore  other  10  dispby  frames.  This 
resulted  in  either  a  leftward  or  rightward  ro¬ 
tation  of  4-20  deg  between  the  two  frames  J  the 
dispby.  A  angle  trial  dispby  con^ted  of  0.5  sec 
of  a  cue  spot.  0.5  sec  bbnk,  the  Srst  frame,  an 
inter-stimulus  bbnic  interval  (or  ISI),  the  second 
frame,  and  a  bbnk.  Each  stimulus  frame  was 
repainted  four  umes  at  60  Hz,  for  a  iota!  dur- 
auon  of  67  msec.  We  deSne  the  ISI  to  be  the 
time  interval  between  the  onset  of  the  last 
pamtrng  of  the  Srst  sumulus  frame  and  the  onset 
of  the  Srst  painting  of  the  second  stimulus 
frame.  For  example,  when  no  bbnk  frames  were 
used,  the  ISI  was  16.7  msec.  Displays  were 


1S2  X  1S2  pixels,  and  were  presented  using  the 
same  apparatus  and  viewing  conditions  as  for 
subject  LIJ  in  expt  1.  The  background  lunu- 
nances  for  subjects  MSL  and  LIJ  were 
15.6cd/mi  and  5.0  cd/m\  respectively.  The  cor¬ 
responding  dot  luminosiues  were  26.8  and 
15.7 extra  ped/dot.  respectivdy.  Nominal  con¬ 
trasts  were  huge  0-c.  nominal  Weber  contrasts 
of  500%  or  more). 

Proceihre.  The  task  was  shape  and  rotation 
identifieation.  Subjects  keyed  their  responses 
using  response  buttons,  and  received  feedback 
on  the  di^by  after  their  response.  Three  groups 
of  trbls  were  run.  In  the  first,  the  ISI  was 
16.7  msec,  and  rotation  angle  between  frames 
was  varied  from  4  to  20  deg.  Since  the  second 
frame  could  be  chosen  from  cither  the  frames 
preceding  or  succeeding  the  middle  frame 
(rotation  to  the  left  or  right),  this  resulted  in  540 
possible  stimuli  (54  displays.  2  directions,  5 
rotation  angles).  These  were  run  in  random 
order  in  4  blocks  of  135  trials.  In  the  second 
group  of  tnals.  rotation  was  kept  constant  at 
4  deg.  ISI  ranged  from  16.7  to  83.3  msec.  This 
again  resulted  in  540  tnals  presenied  in  random 
order  in  4  blocks  of  1 35  mats.  In  the  third  group 
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Fig  4.  Resulu  of  expl  2  Data  for  iwo  subjecu  are  shown  Enor  bars  mdicaie  el  SEM  fA) 
Shape-and-roulioQ  identiiicauon  accuracy  as  a  function  of  the  angle  of  rouiion  between  the  two  frames 
ISI  was  16  7  msec  (B)  Shape-and-rolation  idenlilicalion  accuracy  as  a  function  of  ihe  duration  of  a  blank 
inier-stimulus  interval  (ISI)  Rotation  angle  was  4  deg  (C)  The  two  mampulalior.s  used  in  the  same 
expenment  Note  the  lack  of  interaction 
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of  trials,  both  rotation  angle  and  ISl  vfere 
saried.  The  ISls  vt'ere  either  16.7  or  33.3  msec. 
For  subjtn  MSL,  the  rotation  angles  M-ere 
either  4  or  12  deg.  For  LIJ.  thej  were  either  S 
or  12  deg.  These  four  conditions  (two  rotation 
angles  b>  two  ISls)  resulted  in  432  trials  which 
were  presented  in  random  order  in  4  blocks  of 
lOS  trials. 

Results 

The  results  are  shown  in  Fig.  4.  Each  data 
point  IS  the  percent  correct  over  108  trials.  As  is 
evident  from  the  figure,  shape  identification  can 
be  quite  high  for  these  minimal  motion  displays 
(for  similar  observations  using  different  exper¬ 
imental  methodologv,  see  Braunsiein,  Hoffman, 
Shapiro.  Andersen  &  Bennett,  1987,  Lappin. 
Doner  &  Kotlas,  1 980.  Mather,  1 989,  and  Peter- 
sik.  1980)  For  an  ISl  of  16.7  msec  (Fig.  4A), 
this  entire  sequence  lasted  only  133  msec.  Yet, 
performance  was  as  high  as  Si.6%  for  subject 
UJ.  and  88.9%  for  subject  ,MSL  (62  8%  and 
94  2%  of  their  while-on-gray  dots  performance 
in  expt  1.  respectively)  Two  frames  of  moving 
dots  are  sufficient  for  accurate,  although  not 
perfect 

performance  in  this  shape  identification  task 
Since  these  expenments  were  first  repoiisd 
(Land).  Sperling.  Dosher  S.  Perkins.  l98Ta 
Land).  Sperling.  Perkins  A  Dosher.  I9S’») 
Todd  (1988)  has  also  shown  above-chance  KDE 
performance  for  two-frame  stimuli,  although  in 
his  paradigm  the  two  frames  are  repeated  sev¬ 
eral  times  before  a  response  is  made 

Rotation  angle  and fixation  Performance  as  a 
function  of  rotation  angle  between  the  two 
frames  is  given  in  Fig  4A  Performance  de¬ 
creases  with  increasing  angle  of  rotation  for 
subject  MSL  For  subject  UJ,  performance 
reaches  a  peak  at  8  deg.  and  decreases  for 
smaller  and  larger  rotations  The  decrease  in 
performance  with  larger  rotation  angles  is  to  be 
expected,  since  the  correspondence  problem  be¬ 
comes  increasingly  difficult  as  dots  move  farther 
from  their  initial  positions  One  might  also 
expect  performance  to  drop  as  rotation  angle 
decreases  to  zero  At  extremely  small  rotation 
angles,  the  remaining  motion  would  fall  below 
threshold  In  our  displays,  the  drop  with  small 
rotation  angles  might  be  expected  to  occur  even 
sooner  as  the  small  motions  in  the  display 
became  corrupted  by  poor  spatial  sampling 
(inter-pixel  distance  was  approx.  I  min  arc) 
This  drop  was  only  seen  in  the  data  of  UJ,  and 


presumably  would  be  seen  in  those  of  MSL  if  he 
had  been  tested  using  smaller  rotations. 

In  a  previous  paper  (Dosher  et  al..  1989b).  we 
found  that  adding  a  blank  interval  between 
successive  frames  of  a  30  frame  KDE  stimulus 
reduced  shape  identification  to  near  chance 
performance.  This  was  explained  by  reduction 
of  power  in  the  stimulus  to  the  Ist-order  system. 
This  effect  is  also  seen  here,  where  performance 
decreases  monotonically  with  increasing  ISl 
(Hg.  4B).  Subjea  UJ  performs  at  chance  levels 
with  a  50  msec  or  greater  ISL  while  subject  MSL 
is  still  slightly  above  chance  performance  with 
an  83.3  msec  ISL 

Time  and  distance.  In  the  previous  two  groups 
of  trials,  there  was  a  confounding  between  the 
stimulus  manipulation  (rotation  angle  or  ISl) 
and  do:  velocity.  Greater  rotation  angles  at  a 
fixed  (16.7  msec)  ISl  produced  greater  velociues 
Similarly,  greater  ISls  at  a  fixed  4  deg  romtion 
angle  resulted  in  smaller  velocities.  If  perform¬ 
ance  were  simply  a  function  of  velocity,  then 
rotation  angle  and  ISl  should  trade  off.  In  Fig. 
4C  we  present  the  results  of  varying  both  ISI 
and  rotation  angle  factorially.  We  used  a  differ¬ 
ent  set  of  rotations  for  subject  UJ  than  MSL 
based  on  the  results  in  Fig  4A,  so  that  for  both 
subjects  the  performance  was  expected  to  de¬ 
crease  with  increasing  rotation  angles  As  can  be 
seen  in  the  figure,  the  two  vanables  do  not  trade 
off  as  would  be  expected  if  performance  were 
only  a  function  of  velocity,  or  rotation  speed 
Increasing  rotation  angle  increases  the  difficulty 
of  the  correspondence  problem  Increasing  ISl 
causes  increasing  problems  for  the  motion  de¬ 
tection  system  Both  manipulations  degrade 
performance  in  an  additive  fashion  This  obser¬ 
vation  contradicts  Korte’s  (1915)  3rd  law  of 
apparent  motion  perception,  which  states  that 
an  increase  in  ISl  must  be  count,  acted  by  an 
increase  in  distance  traveled  for  strong  apparent 
motion  In  Fig  4C,  Korte's  law  predicts  a 
cross  over  interaction,  which  is  strongly  dis- 
coiifirmed  However,  Burt  and  Sperling  (1981) 
show  thai  time  and  distance  have  independent 
additive  effects  on  the  strength  of  the  apparent 
motion  of  dot  stimuli,  which  agrees  with  the 
present  results 

KDE  from  optic  fion  Accurate  KDE  per¬ 
formance  requires  a  global  optic  flow  When 
that  optic  flow  is  produced  by  a  minimal  motion 
stimulus — a  two-frame  display — the  shape  per¬ 
cept  may  be  fragile  and  easily  degraded  by  a 
variety  of  stimulus  manipulations  The  stimuli 
are  quite  brief  in  this  paradigm  and.  by  subject 
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reports,  appear  as  a  collection  of  dots  moving 
at  various  speeds,  i-c.  “look  like”  an  optic 
flow.  On  some  trials,  only  patches  of  planar 
motion  are  perceived,  and  the  shape  response 
is  generated  cognitively.  On  other  trials,  a 
3D  surface  is  per^ved.  On  some  trials  the 
optic  flow  is  peremved  and  so  is  the  shape, 
but  the  shape  percept  is  only  “felt”  after  the 
display  is  over.  As  we  discu^ed  extensively  in 
our  first  article  on  the  shape  identification 
task  (Sperling  et  al.,  1989),  KDE  is  inextricably 
tied  with  the  percept  of  an  optic  flow.  It  can 
be  vxry  difficult  to  differentiate  empirically 
betwnn  a  judgment  based  on  a  3D  percept 
and  performance  based  on  an  alternative  strat¬ 
egy  (computationally  equivalent  to  that  re¬ 
quired  for  KDE)  using  a  remembered  set  of  2D 
velocities. 

Reasonably  accurate  performance  on  the 
shape-and-rotation  identification  task  results 
from  only  tuo  frames  of  300  points.  In  the 
computer  vision  literature,  there  have  been  sev¬ 
eral  studies  of  the  structure-from-moiion  prob¬ 
lem  resulting  in  theorems  of  the  following  form 
*‘m  views  of  n  points  under  the  following  restne- 
tions  of  the  motion  path  suffice  to  determine  the 
3D  structure  up  to  a  reflection"  (Bennett  & 
Hoffman,  1985.  Hoffman  &  Bennett.  1985. 
Hoffman  &  Flinchbaugh.  1932.  Oilman.  1979) 
It  has  been  suggested  that  these  minimal  con¬ 
ditions  for  structure  from  motion  also  govern 
human  perception  (Braunstein  et  al .  1987, 
Peicrsik.  1987)  The  particular  models  just  men¬ 
tioned  do  not  have  any  prediction  concerning 
performance  in  the  300  points,  2  views  situation 
used  here  An  exception  is  a  recent  paper  by 
Bennett.  Hoffman.  Nicola  and  Prakash  (1989), 
where  it  is  shown  that  there  is  a  one  parameter 
family  of  possible  interpretations  for  two  frames 
of  four  or  more  points  This  family  is  paramc- 
lenzed  by  the  slant  of  the  axis  of  rotation  (as  in 
the  "isokinescopic  displays”  desenbed  by  Adel- 
son,  1985),  and  the  paper  does  not  deal  explic¬ 
itly  with  rotation  axes  in  the  image  plane,  as 
used  here.  On  the  other  hand,  models  that 
compute  3D  structure  based  only  upon  a  single 
velocity  field  do  allow  for  this  performance 
(Longuet-Higgins&  Prazdny,  1980,  Koendennk 
&  van  Doom.  1986)  We  take  our  experimental 
results  as  evidence  for  optic  flow-based  methods 
for  the  KDE,  as  opposed  to  models  requiring 
three  or  more  views  In  particular,  our  results 
strongly  rule  out  models  that  require  measure¬ 
ment  of  acceleration  in  addition  to  velocity  (e.g 
Hoffman,  1982) 


Structurc-from-motion  computation  may 
improve  its  3D  representation  with  additional 
information  (e.g.  with  additional  frames, 
Crzy-wacz.  Hildreth,  Inada  &  Adclson,  1988; 
Hildreth  &  Grzyw'acz,  1986;  Landy,  1987; 
Ullman,  1984).  The  shape  in  our  two-frame 
displays  does  not  always  appear  to  have  the 
depth  extent  that  results  from  the  30  frame 
displays  of  expt  1,  and  two-frame  performance 
is  reduced  reiativ*e  to  30'frame  performance. 
The  shape  identification  task  can  be  solved  by 
knowing  only  the  sign  of  depth  and  direction  of 
motion  in  each  spatial  location  (up  to  a  reflec¬ 
tion),  without  accurately  estimating  either  vel- 
oaty  or  the  amount  of  depth. 

DISCLSSION 

Two  expenments  investigated  the  type  of 
motion  detection  mechanism  used  as  an  input  to 
the  siructurc-from-moiion  system.  Performance 
in  the  shape-and-rotaiion  identification  task 
was  accurate  regardless  of  the  token  used  to 
carry  the  motion,  as  long  as  that  token  was 
presented  with  constant  contrast  polanty  (the 
whitc-on-gray  and  paticm-on-gray  conditions). 
The  performance  decrements  seen  with  contrast 
polanty  aliemaiion  and  the  two  microbalanced 
conditions  add  further  evidence  to  the  con¬ 
clusion  of  Doshcr  Cl  al  (1989b)  that  Isi-order 
motion  detectors  are  the  pnmary  substrate  for 
the  computation  of  shape.  In  addition,  there  are 
indications  of  an  input  to  the  shape  compu¬ 
tation  from  2nd-order  motion  mechanisms, 
which  IS  weak,  low  in  spatial  resolution,  and 
concentrated  at  the  fovea.  2nd-ordcr  mechan¬ 
isms  that  require  temporal  flitenng  (i.e  detec¬ 
tion  of  flicker)  pnor  to  a  point  nonlineaniy  were 
useless  here  because  of  the  spatial  resolution 
required  by  our  stimuli.  These  sorts  of  detectors 
would  only  be  useful  for  KDE  displays  involv¬ 
ing  a  small  number  of  moving  features,  rather 
than  the  densely  sampled  optic  flows  required 
for  (he  determination  of  precise  shapes  of 
curved  surfaces  from  motion  cues  The  results 
from  the  two-frame  expenments  reinforced 
ihcsf  conclusions.  They  also  demonstrated  that 
deiCAtion  of  instantaneous  velocity  is  sufficient 
for  KDE,  acceleration  is  not  required,  nor  are 
more  than  two  views. 
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measuring  the  spatul  frequency  selectivity 

OF  SECO.ND-ORDER  TEXTURE  MECHA.N1SMS 
Arjit  Suaer,  George  Soerlint.  &,  Charles  ChMbb 
Human  Infomiation  Proceasmn  Laboaioy.  New  Yoric  Uiuvemt) .  JvT.  KY  lOWS 
Recent  studies  of  texture  and  flsooon  perception  suggest  tuo  parable! 
processing  S)'siems:  a  first-order  system  consisting  of  selecave  linear  filters 
followed  inHnediaiely  by  detectors,  and  a  second-order  system  in  which 
preprocessing  (coosistiag  of  an  iniaal  stage  of  bnear  filtenng  followed  b) 
recdficanon)  precedes  subsequent  stages  of  selective  linear  filtenng  and  detection 
Here  we  measure  two  prcpcnits  of  the  second-order  system,  the  contrast 
roodulanon  sensitivity  as  a  funcaon  of  spatial  i^ucncy  (NITF)  of  its  second- 
stage  filters,  and  the  relation  of  initial  spatial  filtenng  to  second-stage  selecDvity 
To  determine  the  MTF.  amplitude  modulaoon  thresholds  were  detennined  for 
Gabo:  modulanons  of  a  earner  noise.  The  canter  was  spaually  bandlutuied  noise 
with  an  approximate  bandvddth  of  one  ocuve  Four  camer  bands  were  created 
with  center  frequencies  of  2. 4, 8.  and  16  cTdeg  The  spatial  frequency  of  the  test 
signals  Cimposed  amplitude  modulantns)  ranged  fiom  05  to  8  ^deg  We  used  a 
staircase  procedure  that  required  subjects  to  specify  the  onentauon  (vertical  o: 
horuontal)  of  the  modulaung  signal 

Results  (1)  The  threshold  amplitude  of  sipial  modulanon  w'as  lowest  for  0  5 
ar'i  1 0  c/deg.  Above  1 0  c/deg.  threshold  increased  with  frequency  (2) 
Threshold  modulation  was  independent  of  the  spaual  frequency  of  the  camer 
noise*.  (3)  There  was  no  significant  inteiactic4i  of  earner  frequency  band  wnh  the 
modulating  frequency.  These  results  indicate  that  the  second-stage  selecuve 
fillers  and  deteaors  are  most  sensitive  to  frequencies  less  than  or  equal  to  1  c/deg 
but  that  they  are  indifferent  to  the  spaual  fr^uency  content  of  the  camer  noise 
upon  which  these  signals  are  impress^ 

JJiT.4  Koendennk.  Vu  Res  25(4)pp  511-521 
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*Psyeholosy  Depanxsenu  Rutfcrs  University 
For  a  test  patch  of  isotropic  spadal  texture  P  embedded  m  a 
surrounding  texture  field  5.  the  perceived  contrast  of  P  depends 
substandaily  on  the  contrast  of  the  texture  surround  5.^  When  P  is 
surrounded  by  a  high  contrast  texture  vtrlth  a  similar  spadal  frequency 
content,  it  appears  to  be  less  contrasty  than  when  it  is  sunounded  by  a 
uniform  field.  Here  we  demonstrate  that  this  lateral  suppression  of  P’s 
apparent  contrast  by  the  surrounding  texture  S  is  oriemaiion  specific.  That 
is.  suppression  of  apparent  contrast  of  a  patch  of  sinusoidal  grating  P  by  a 
surround  grating  S  of  the  same  roadal  teuency  is  greatest  when  the  angle 
between  gradngs  P  and  5  is  6  deg.  Using  dynamically  phasc'Shifcing 
sinusoidal  gratings  of  33.  10  and  20  c/deg.  we  measi^  orienuuon* 
specific  suppression  of  apparent  contrast  at  two  level:  of  contrast  Results 
(1)  Both  parallel  and  orthogonal  S  gratings  caused  suppression  of  P’s 
apparent  contrast  reladve  to  a  uniform  surround.  (2)  There  was  onenuoon 
s^ificity  (greater  contrast  inhibition  by  0  than  ^  deg  sunounds)  for  all 
5  •P  combmadons  except  the  high-concrasi  3.3  c/deg  grating  and  the  low 
contrast  20  c/deg  gradng  (which  was  invisible).  (3)  Onentauon  specificity 
increased  with  greater  spadal  frequencies  and  with  lower  stimulus 
contrasts.  The  results  suggest  a  contrast  perception  mechanism  in  which 
both  onenied  and  nononented  unit|^  determine  the  perceived  lightness  or 
darkenss  of  a  point  in  visual  space,  and  every  unit  is  inhibited  pnmanly  by 
similar  adjacent  units 

*Oubb.C.Sfwiu'|,G.ASoJomon.J.A.(J9S9)  Proc.N*iJ  Ac^Sc  USA  M. 9631-9635 
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THE  VISIBLE  PERSISTENCE  OF  STIMULI  IN 
^STROBOSCOPIC  MOTION 

Joyce  E  Farrell,*  M.  PxvELt  and  George  Sperling 
New  York  Unntrsuy,  Washington  Square.  New  York.  NY  10012.  USA 

{Re(«Hfd  14  Noctmbtr  1988;  tn  raised  form  25  Seotember  1989) 

Abstract^ThiS  paper  reports  an  improved  paradigm  to  measure  visible  persistence  The  stimulus  is  a  pair 
of  lines  stroboscopically  displa)ed  in  successive  positions  moving  in  oppo>ite  directions  The  subjects' 
judgement  of  simultaneous  appearance  of  all  the  presented  lines  is  used  to  estimate  visible  persistence 
This  paradigm  permitted  independent  manipulation  of  spatial  and  temporal  stimulus  separations  m  linear 
motion  The  resulting  estimates  of  visible  persistence  increase  with  spatial  separation  up  to  0  24  deg  of 
visual  angle  and  approaches  a  maximum  value  at  larger  spatial  separations.  The  results  are  consistent  with 
the  existence  of  a  hypothetical  visual  gam  mechanism  that  operates  over  small  retinal  distances  to 
effectively  decreas^  persistence  duration  with  decreasing  spatial  separation 

Visible  perstsiance  Stroooscopic  motion  Apparent  motion 


INTRODUCTION 
Stroboscopic  /notion 

In  ariifictal  representations  of  natural  object 
motion,  such  as  in  movies,  television,  and  com* 
putcr  driven  visual  displays,  continuous  motion 
IS  represented  by  a  succession  of  discrete 
samples  By  increasing  the  temporal  sampling 
rate  of  an  object  moving  at  a  fixed  velocity,  one 
can  create  an  illusion  of  motion  that  is  indis¬ 
tinguishable  from  the  appearance  of  continuous 
motion  (Sperling,  1976.  Watson,  Ahumada  & 
Farrell.  1983)  When  the  sampling  rate  is  not 
high  enough,  however,  the  appearance  of  con¬ 
tinuous  motion  IS  replaced  by  multiple  images 
of  the  moving  object 

Consider,  for  example,  the  stroboscopic  rep¬ 
resentation  of  a  single  vertical  line  moving 
horizontally  across  a  display  screen  For  some 
spatial  and  temporal  separations  of  the  line  in 
stroboscopic  motion,  instead  of  a  single  Imc, 
observers  perceive  a  number  of  lines  moving 
together  across  the  screen  (Allpori,  1968).  An 
analogous  phenomenon  in  real  motion  is  the 
apparent  elongation  of  a  rapidly  moving  object 
(Newton,  1720,  Allen,  1926)  The  obvious 
explanation  for  the  apparent  multiple  lines  in 


*To  whom  repnnt  requests  should  be  addressed,  present 
address  Hewlett-Packard  Laboratories.  P  O  Box 
1(M90,  Palo  Alto.  CA  94303-0971.  USA 
tPresent  address.  Department  of  Psjchology,  Stanford  Um- 
vcrsit).  Stanford  CA  94305.  USA 


stroboscopic  motion  and  the  smearing  in  real 
motion  1$  that  each  flash  of  the  Ime  produces 
an  image  whose  visibility  persists  over  lime 
and  which,  therefore,  temporally  overlaps 
subsequent  flashes  of  the  line. 

According  to  this  explanation,  the  visible 
persistence  of  an  image  can  be  estimated  by  the 
number  of  successive  stimuli  that  appear  to  be 
simultaneous.  For  example,  if  a  stimulus  is 
visible  for  approx.  100  msec,  it  should  appear  to 
temporally  overlap  stimuli  that  follow  in  less 
than  100  msec.  Previous  estimates  of  the  dur¬ 
ation  of  visible  persistence  based  on  this  method 
range  between  100  and  300  msec  (Coltheart, 
1980)  When  the  distance  and  time  between 
successive  stimuli  approaches  zero,  as  in  the  case 
of  real  motion,  the  duration  of  visible  persist¬ 
ence  can  be  estimated  by  the  length  of  an 
object’s  blur  streak  Estimates  of  the  duration  of 
visible  persistence  based  on  this  latter  method 
(Burr,  1980)  range  between  2  and  5  msec 
Apparently,  the  procedure  for  investigating  the 
persistence  of  stroboscopically  moving  stimuli 
generates  a  different  estimate  of  persistence  dur¬ 
ation  than  the  procedure  for  investigating  the 
persistence  of  continuously  moving  stimuli.  But 
should  we  attribute  this  difference  to  differences 
in  the  paradigms  used  for  estimating  persistence 
duration?  Or  do  different  perceptual  mechan¬ 
isms  underlie  the  visible  persistence  of  stimuli  in 
stroboscopic  (“apparent  motion”)  and  continu¬ 
ous  (“real”)  motion? 
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Farrell  (1984)  estimated  the  visible  persistence 
of  stimuli  m  stroboscopic  motion  by  asking 
observers  to  report  the  number  of  successively 
presented  stimuli  that  appeared  to  be  simul¬ 
taneously  visible.  She  foui^^hauthe  estimated 
durations  of  visible  persistence  increased  with 
the  distance  separating  the  successive  stimuli. 
This  finding,  taken  together  with  reports  by 
Dixon  and  Hammond  (1972).  Allport  (1970) 
and  DiLollo  and  Hogben  (1985),  provides  an 
explanation  for  the  paradox  that  the  visible 
persistence  of  continuously  moving  stimuli  is 
relatively  short  (Burr,  1980)  when  compared 
to  the  persistance  of  stroboscopically  moving 
stimuli  (Allporl,  1970.  Efron  &  Lee,  1971) 
When  the  distance  between  successive  stimuli  is 
small,  the  duration  of  visible  persistence  is 
small,  as  the  distance  increases,  persistence  in¬ 
creases  This  reduces  the  smear  generated  by 
moving  objects  but  extends  the  time  available  to 
process  stationary  objects  (e.g  Burr.  1980, 
DiLollo.  1980.  Sperling,  1967) 

Because  visible  persistence  can  have  many 
dilTerent  causes,  it  is  important  to  determine 
whether  lawful  behavior  measured  using  one 
paradigm  extends  to  other  procedures  In  this 
paper,  we  first  review  some  previous  methods 
for  estimating  visible  persistence.  We  then  de« 
scribe  a  new  procedure  that  we  believe  over¬ 
comes  some  of  the  limitations  of  the  previous 
procedures  Using  our  new  method,  we  extend 
the  measurements  made  by  Farrell  (1984)  and 
by  DiLollo  and  Hogben  (1985)  by  investigating 
the  duration  of  visible  persistence  over  a  wide 
range  of  spatial  separations  The  new  data  that 
we  report  in  ^his  paper  sheds  light  on  the  type 
of  mechanism  that  may  underlie  the  visible 
persistence  of  moving  stimuli  and  the  range  over 
which  the  mechanism  operates 

Paradigms  for  estimating  the  duration  of  visible 
persistence 

The  duration  of  visible  persistence  of  an 
object  in  stroboscopic  motion  can  be  estimated 
by  the  number  of  successive  objects  that 
appear  to  be  physically  present  at  the  same  time 
{Allport.  1968;  Dixon  &  Hammond,  1972; 
Efron  &  Lee,  1971)  Here,  we  consider  the 
hypothesis  »hat  for  describing  the  app>earance  of 
stroboscopically  moving  objects,  the  visual  sys* 
tern  can  be  represented  by  two  stages  The  first 
stage  represents  low  level  perceptual  units  and  is 
represented  by  a  spatio-temporal  filter  whose 
response  embodies  visible  persistence,  it  length¬ 
ens  the  duration  of  its  visual  inputs  The  second 


stage  monitors  the  perceptual  units  of  the  first 
stage  and  decides  which  of  the  units  are  active 
by  comparing  their  output  to  a  threshold  The 
number  of  simultaneously  active  units  corre¬ 
sponds  to  the  number  of  simultaneously  visible 
stimuli.  For  example,  suppose  that  a  briefly 
presented  luminous  line  elicits  a  visual  sensation 
(the  first  stage  response)  that  decays,  and,  after 
100  msec.  the  persisting  sensation  is  no  longer 
visible  (below  threshold  of  the  second  stage) 
Suppose  also  that  the  line  is  presented  every 
100  msec  in  a  new  position,  as  illustrated  in 
Fig.  1.  This  system  will  report  that  it  sees  only 
one  line  because  the  visible  persistence  of  succes¬ 
sive  stimuli  does  not  overlap.  When  the  line  is 
represented  every  50  msec,  the  system  reports 
seeing  two  lines  because  the  visible  persistence 
of  two  successive  stimuli  will  overlap.  By  the 
same  reasoning,  the  system  will  report  3  lines 
when  the  line  is  presented  every  33  msec  and  4 
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Fig  I  This  figure  illustrates  ihe  h)pothctical  case  in  \^hich 
a  briefly  presented  visual  stimulus  creates  a  persisting  sen¬ 
sation  that  decays  over  time  such  that  after  100  msec  the 
persistence  decays  to  a  level  below  which  it  is  no  longer 
visible  In  the  top  panel,  the  stimulus  is  presented  in  a  new 
position  every  100  msec  and  a  single  line  should  appear  to 
be  present  at  any  one  instant  in  time  The  second  and  third 
panels  show  instances  in  which  successively  presented  stim¬ 
uli  generate  visual  responses  that  overlap  m  time  In  general 
if  the  perceived  number  of  stimuli  increases  linearly  with  the 
rate  of  stimulus  presentation,  then  the  slope  of  the  linear 
function  can  be  used  to  estimate  the  duration  of  visible 
persistence 


Visible  persistence  of  stimuli 
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lines  when  the  line  is  presented  every  25  msec. 
In  general,  when  the  perceived  number  of  lines 
increases  linearly  with  the  rate  of  stimulus  pres¬ 
entation,  then  the  slope  of  the  linear  function 
can  be  used  to  estimate  the  ^rptiQO  of  visible 
persistence. 

Allport  (1968.  1970)  and  Efron  and  Lee 
(1971)  estimated  the  duration  of  visible  persist¬ 
ence  from  the  number  of  simultaneously  visible 
lines  by  means  of  a  computation  very  similar  to 
that  embodied  by  the  2-stage  system  descnbed 
above.  For  example,  Efron  and  Lee  (1971) 
assumed  that  visible  persistence  can  be  de¬ 
scribed  by  a  single  real  number,  its  duration  p. 
Efron  and  Lee  reasoned  that  the  number  of 
stimuli  that  will  appear  to  be  simultaneous  is 
II  ==  pit  where  t  is  the  time  interval  separating 
Jwo  ac(jacent  stimuli,  and  n  is  the  average  num¬ 
ber  of  observed  lines.  Implicitly,  this  prediction 
assumes  that  the  probability  that  the  number  of 
successive  stimuli  will  appear  to  be  simul¬ 
taneously  visible  IS  proportional  to  the  degree 
to  which  the  visible  persistence  of  successive 
stimuli  overlap  Let  the  number  of  lines  simul¬ 
taneously  observed  on  a  particular  trial  be  a 
random  vanable  N  and  let  n  be  the  expected 
value  of  iV.  These  assumptions  lead  to  the 
prediction  that' 

II  =  £(jV)  =  max  I 

When  p  41.  the  expected  value  of  N,  E(N),  is 
1  representing  the  fact  that  observers  report 
seeing  a  stimulus  even  when  it  is  not  visible  all 
the  time  When  p'  r,  E(N)  is  ph  This  predic¬ 
tion  IS  precisely  correct  only  for  integer  values 
of  pit  (see  below). 

Efron  and  Lee  (1971)  varied  the  rate  at  which 
a  rotating  line  was  strobed  and  asked  observers 
to  report  how  many  lines  they  saw  at  any  one 
time  They  derived  the  duration  of  visible  per¬ 
sistence  from  the  slope  of  the  linear  functions 
relating  the  strobe  rate  and  the  number  of  lines 
observers  reported.  Estimates  of  the  duration  of 
visible  persistence  ranged  between  133  and 
144  msec 

The  most  significant  difficulty  with  these  pro¬ 
cedures  for  estimating  visible  persistence  is  that 
the  observer  must  count  the  number  of  per¬ 
ceived  lines.  To  determine  the  visible  persistence 
of  stroboscopic  stimuli  that  approximate  real 
motion,  we  must  estimate  the  persistence  of 
closely  spaced  stimuli  This  requires  counting  a 
large  number  of  closely  spaced  lines,  where  both 


the  spacing  and  the  number  make  counting 
impractical.  Alternatively,  the  classical  pro¬ 
cedure  (Newton.  1720;  Allen,  1926)  for  estimat¬ 
ing  persistence  of  an  object  in  real  motion 
(revived  by  Burr,  1980)  utilizes  the  length  of  the 
object's  blur  streak  to  estimate  visual  persist¬ 
ence.  While  It  avoids  the  counting  problem,  this 
method  still  requires  the  subject  to  estimate  the 
size  of  a  rapidly  moving  object. 

A  second  problem  occurs  when  the  spatial 
position  of  the  stimuli  in  stroboscopic  or  real 
motion  is  uncertain,  in  this  paradigm  (Efron 
and  Lee.  1971)  the  exnenmenter  has  no  control 
over  where  or  when  the  count  of  visible  lines 
occurs.  Further,  the  expenmenter  does  not 
known  during  what  fraction  of  the  trajectory 
the  reported  number  of  lines  is  visible 

Third,  the  observed  duration  of  persistence 
and  the  number  of  simultaneously  visible  stim¬ 
uli  are  not  absolutely  constant  from  trial-to-tnal 
but,  like  everything  else  psychologists  measure, 
vary  The  stochastic  nature  of  these  measures 
must  be  reflected  in  the  data  collection  and 
analyses  procedures.  Thus,  the  observed  dur¬ 
ation  of  visible  persistence  should  be  repre¬ 
sented  by  a  random  variable  The  explicit 
treatment  of  persistence  as  a  random  variable 
in  data  analysis,  and  the  measurement  of  its 
distribution  may  prove  useful  for  evaluation  of 
potential  theories. 

We  propose  here  a  paradigm  and  a  method  of 
analysis  to  overcome  the  problems  of  counting, 
of  spatial  indeterminacy,  and  of  measunng  the 
random  variation  of  persistence.  The  paradigm 
IS  used  to  extend  the  range  of  spatial  and 
temporal  conditions  over  which  it  has  been 
possible  to  measure  persistence  m  stroboscopic 
motion.  The  analysis  is  used  to  obtain  esti¬ 
mations  of  the  complete  trial-to-trial  distn- 
butions  of  persistence  m  the  various  conditions 

The  paradigm 

In  our  expenments,  two  vertical  lines  one 
above  the  other,  move  honzontally  in  strobo¬ 
scopic  motion  in  opposite  directions  over  a 
fixed  distance  (Fig  2).  Successive  positions,  are 
separated  by  a  fixed  interval  of  time  Ar  and  a 
displacement  of  Ax  to  the  right  for  one  line  and 
—Ax  (leftward)  for  the  other.  For  different  At 
and  Ax,  observers  report  whether  or  not  all  the 
lines  in  both  paths  appear  to  be  simultaneously 
present  They  are  instructed  to  respond  “yes"  if 
they  perceive  a  flickering  grating  composed  of 
all  the  positions  of  the  lines  and  to  respond 
“no"  if  they  do  not 
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To  estimate  the  duration  of  visible  persistence 
with  this  paradigm,  we  assume  that  each  briefly 
presented  stimulus  generates  a  visual  response 
that  decays  over  lime.  If  the  first  presented 
stimulus  in  one  row  is  still  ^sible  when  the  last 
presented  stimulus  occult  in'  the  other  row 
immediately  above  or  below  it,  the  observer 
responds  “visible";  otherwise,  “not  visible" 
This  paradigm  determines  the  proportion  of 
trials  on  which  a  stimulus  remains  visible  from 
the  first  flash  to  the  beginning  of  the  last  flash 
in  a  row 

Responses  are  inherently  probabilistic.  We 
assume  that  they  reflect  triabto*tital  variability 
in  either  or  both  the  temporal  waveform  of  the 
persistence  response  and  in  the  subject’s  cn- 
tenon  for  deciding  whether  the  stimulus  is 
visible  The  anaHysis  takes  into  account  the 
probabilistic  nature  of  the  data  m  order  to 
separate  the  effects  of  the  retinal  separation  on 
(1)  the  mean  duration  of  visible  persistence  and 
on  (2)  the  tnal*to«trial  variation  of  visible  per¬ 
sistence.  The  analysis  docs  not  distinguish  be¬ 
tween  causes  of  variability,  such  as  fluctuations 
in  the  underlying  visual  response  and  fluctu¬ 
ations  in  the  threshold  criterion. 

EXPERIMENT  I 

Method 

Subjects.  Data  were  collected  from  four 
observers,  including  one  of  the  authors  (JF).  All 
observers  had  normal  or  corrcctcd-to-nonnal 
vision. 

Sumuh.  The  stimuli  were  vertical  lines  drawn 
on  a  HP1310  ert  display  with  a  P4  phosphor 
The  background  of  the  display  was  illuminated 
by  incandescent  lights  that  produced  a  back¬ 
ground  luminance  of  0.35  cd/m^  Subjects 
viewed  the  display  from ..  uistance  of  94  cm  and 
each  Nertical  line  subtended  0  235  deg  of  visual 
angle  (0  386  cm)  Each  line  was  displayed  for 
less  than  I  msec  at  the  same  stimulus  intensity. 
The  horizontal  and  vertical  distance  between  the 
centers  of  adjacent  raster  pixels  was  0.0193  cm 
and  each  stimulus  was  composed  of  a  vertical 
column  of  20  raster  pixels  Each  pixel  had  a 
luminance  directional  energy  (cf.  Sperling,  197!  j 
of  0  09  cd-sec.  This  stimulus  intensity  will  here¬ 
after  be  referred  to  as  the  reference  intensity 

Two  vertical  lines  were  presented  in  a  succes¬ 
sion  of  positions,  each  position  following  the 
other  by  a  fixed  interval  of  time,  Cst,  and  dis¬ 
placed  to  the  right  (or  left)  by  a  distance,  Ax,  as 
shown  in  Fig  2  One  of  the  vertical  lines  was 
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Fig  2  The  display  for  Expis  1-3  two  \erucai  lines  were 
presented  m  a  succession  of  positions  along  the  paths  p  and 
p'  as  shown  above  Each  position  of  the  line  followed  the 
other  by  a  fixed  interval  of  time.  A/,  and  was  displaced  by 
a  fixed  distance.  Ax.  m  a  constant  direction  (left  or  right) 

presented  with  us  bottom  0,12  deg  above  a 
fixation  point  and  extending  upward  for 
0.24  deg  The  other  vertical  line  was  presented 
symmetrically  0  1 2  deg  below  the  fixation  point 
The  two  vertical  lines  were  presented  in  the 
same  horizontal  positions,  diffenng  only  in  a 
spatial  shift  m  the  vertical  direction  and  in  the 
temporal  order  of  presentation,  On  each  tnal, 
the  direction  of  motion  of  the  upper  line  was 
randomly  chosen,  the  lower  line  moved  in  the 
opposite  direction 

Subjects  were  instructed  to  .tare  at  the 
fixation  point  for  the  duration  of  each  stimulus 
presentation  The  fact  that  the  two  vertical  lines 
moved  in  opposite  directions  helped  subjects  to 
keep  their  gaze  on  the  fixation  point  and  dis¬ 
couraged  them  from  tracking  the  stimulus  with 
their  eyes  Making  any  eye  movement  during 
the  display  would  often  cause  it  to  appear 
distorted  (see  Farrell,  Putnam  &  Shepard,  1984) 
and  subjects  quickly  learned  to  suppress  eye 
movements. 

Across  trials,  stimuli  differed  in  the  distance 
between  successive  lines,  Ax,  the  time  interval 
separating  the  successive  lines.  A/  and  the  total 
number  of  lines  that  were  presented,  N.  The 
distance,  A.x,  separating  successive  positions  of 
each  vertical  line  was  either  0  12,  0.18  or 
0.36  deg  of  visual  angle.  The  length  of  the 
horizontal  path  of  each  vertical  line  was  equal 
to  the  product  of  {N  —  1)  and  Ax  For  example, 
when  Ax  was  0. 1 2  deg  of  visual  angle,  N,  was  1 3, 
16  or  19  in  order  to  obtain  path  lengths  corre¬ 
sponding  to  1  44,  1.80  and  2  16  deg,  respect¬ 
ively.  Similarly,  when  Ax  was  0  18  deg,  N  was  9, 
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pcsascs;.  cce  Eaa  ba=s=^  Iraci  tb:  bft  of  iba 
CEidsa  poca  3ad  pcoeas&s  to  lae  x^i  aad 
iba  oeba  Eaa  fiea  ifaa  i»sl  of  iba 

fiiaaoa  pocai  sad  paocafas  to  tba  ^t.  Al  Iba 
cad  ofcitb  uid.  t^  sta^aat  pcssrd  ooa  of  two 
faspoQsa  isti  to  iadmia  sbatbar  or  so:  all 
strcasssva  prascuatioat  of  tba  Eaai  oa  both 
uajstocn  appearrd  to  be  satcbaaaosilv  tat- 
Sab}K!»  ware  ^yrifiraliy  iasutxxad  to 
raspoad  ~ya~  if  they  parasied  a  Sateiaj 
eiadaj  oKaposad  of  an  tba  poatioas  of  tba  Eaas 
abene  aod  bdov  tba  fiaaikn  poist  aad  to 
raspoad  “oo"  otbsme. 

Aa  exparaaaaral  sassoa  esasbrad  of  tbraa 
blocks  <d'  120  trials  correspoading  to  tba  tbraa 
diSaaot  path  tangib  coadiiions.  kVitbiQ  each 
bkxk  of  trials,  cadi  coadiiioa  of  spatial  sapar- 
aiioa  Ax  aas  presented  40  Ibacs.  Tba  40  rep¬ 
etitions  «ierc  presantad  nitUn  two  intatkavd 
staircases.  Tba  total  120  tiials  resultina  from  tba 
product  of  tba  a  Ax.  tba  20  repetitions  par  Ax. 
and  tba  2  staircase  conditions  were  presented  in 
a  random  order. 

The  interstimulus  intersal  was  controlled  tn  a 
modified  up-down  suitcase  (IxsitL  1970).  The 
startug  salue  of  the  interstimulus  intersal  (ISl) 
in  the  first  expetimcnul  session  was  50  msec.  If 
the  subject  responded  "no",  the  Ai  was  de¬ 
creased  t»  2  milliseconds  and  this  new  At  was 
stored  for  the  next  presenution  of  this  suitcase. 
If  the  subject  responded  "ses”  for  rno  presen- 
utions  of  the  same  stimuU.  the  ISl  was  in¬ 
creased  by  2  msec  and  this  new  ISl  was  stored 
to  be  presented  later  in  the  pre-arranged  ran¬ 
dom  sequence  of  trials.  The  suitcase  procedure 
adjusts  the  temporal  separation  so  that  71%  of 
the  time  the  ,V  successively  presented  stimuli 
appear  to  be  simuluneously  present.  This  same 
procedure  was  repeated  for  another  interleaved 
staircase.  The  complete  set  of  dau  provided  by 
the  two  interleaved  suitcases  allows  us  to  esti¬ 
mate  psychometric  functions  for  each  condition 
of  spatial  separation. 

In  subsequent  expcnmenul  sessions,  the  in¬ 
itial  value  of  the  At  was  set  equal  to  the  esti¬ 
mated  71%  threshold  from  the  earlier  sessions 
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dfetbod  of  oaeftxbt.  Ocx  analysis  rests  oa  tbe 
asseetpeioa  that  a  bciedy  peesemed  isedsces 
Eae  geaerates  a  viscal  tespoQse  that  decays  over 
time,  and  that  after  some  time  tbe  viscal  re¬ 
sponse  reacbes  a  threshold  below  wbith  it  is  no 
longer  taslde.  U'e  make  so  assumption  about 
tbe  shape  of  tbe  vasual  response:  we  simply 
assume  that  as  long  as  tbe  visual  response 
generated  by  tbe  stinuhis  is  above  threshold, 
tbe  stimulus  win  appear  to  be  preseuL  If  sub- 
jnas  report  that  aB  .V  fines  appear  to  be  simul¬ 
taneously  present,  then  we  assume  that,  for 
some  instant  during  that  partieular  iriaL  tbe 
visual  responses  generated  by  tbe  .V  fines  were 
all  above  ibresbold.  As  a  practical  matter,  from 
tbe  subject's  prant  of  view,  tbe  question  of  A’ 
viable  lines  reduces  to  tbe  simuluneous  visi¬ 
bility  of  tbe  first  and  last  Ime.  No  subject 
repotted  that  tbe  first  and  last  fines  of  a  trajec¬ 
tory  were  viable  but  some  interior  line  had 
vanished. 

Let  the  observable  time  interval  dunng  which 
the  image  of  all  A'  lines  are  visible  (i.e.  above 
threshold)  be  a  random  variable.  D.  As  noted 
earlier  the  random  variability  in  D  may  be  'be 
result  of  threshold  variability  in  the  decision 
suge,  variability  of  the  decay  function,  or  other 
random  effects  (noise).  At  the  outset,  we  assume 
the  distribution  of  f>  to  be  normal  with  mean  r 
and  vanance  o'.  This  assumption  is  directly 
tested  in  the  process  of  data  analysis.  For  given 
values  of  Ar  and  Af.  we  wish  to  find  p  (Ar.  jV). 
the  estimate  of  the  probability  that  the  first  and 
last  fines  will  appear  to  be  visible  simulta¬ 
neously.  p(Ai.  .V)  is  equal  to  the  probability  that 
the  first  line  has  not  decayed  below  threshold 
dunng  the  time  interval  (AT  -  I  )Ar  separating 
the  onset  of  the  first  and  last  stimulus.  i.e. 

p(Ar.,V)  =  />ro4[0  >(iV-  DAr] 

=  l-'f{((.V-l)Ai-rl<7j.  (I) 
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To:  fccanoa  ass  pofceasd  css;  lbs  sbaasibstibsmfaadsaiaoaofwgJspsoisi- 
sscaitfjl  froadsx STEPIT (CbtadSs.  I$£S)  ssss  scrszsa.aiib  tbs  &3acs  tsparaica  tbs 
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ar^  lbs  rccsiTsd  sssa  s  zed  wmnss  o  trf'  (FsrsJl.  I93S). 

psnisscs  dssboa  rspssv.it  tbs  data,  lbs  Sscoad.  Fig.  3  tbaas  tl»t  tbs  evan  s  aad 
ssnsats  ass  cssd  s  pstScl  tbs  frsqosaass  ssadani  dsriatsoc  c  of  tbs  \isibSs  psssissas 
~visibis~  is^xssss  for  eadi  stibysst  in  dsratSoagsasratsdbyabrisdypsssatsdstbaS' 
sad)  coadicoa  of  Ax  for  cash  tadhidsd  Ar  las  do  ool  lao'  tbs  ccaibs  of  stmaS.  .V. 
rrariarf  by  tbs  fiaucas.  Each  of  tbs  12  esti-  that  ars  saccssshsK-  prsssaudL  Tbs  aisao  aad 
,  satsd  aoTBOl  dstnbcitoas  ssssthsh  prsdk:  standard  dstaaiioa  dqisad  oaK  on  tbs  dsuass 
lbs  itsposse  probabibtiss.  We  caacot  rtyset  tbs  sspaiaane  saecsssits  susaSi.  Aix.  This  rssab  is 
prsibctiocs  oa  lbs  basis  of  a  basis  of  a  test  also  cocsisuat  nitb  previoas  Sadia;:.  Efroa 

at  p<Oj05  for  aay  sebicct  in  aay  stbaalas  aad  Lee  (1972)  aad  Fairs!]  (19S4)  obs^sd  tbai 
coadiiioa.  tbs  numbs  of  saccessiss  stiamS  that  appear 
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Fig  3  The  estimated  mean  (solid  s)mbob)  and  standard  desiation  (open  sjmbois)  of  the  \istble 
perststenee  of  a  bnelty  presented  sisual  stimulus  plotted  as  a  function  of  the  distance  separating  the 
stimulus  line  from  other  stimuli  that  occur  later  m  time  uilh  the  length  of  the  stimulus  path  as  the 
parameter  Ctr^.  tnangles  and  squares  correspond  to  stimulus  paths  of  1  44. 1  SO  and  2  I6dcg  visual 
angle,  respective!)  Each  panel  represents  data  from  one  subject 
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nnaOy,  Fie.  3  shoas  that  tbs  tafiabifiiy  of 
psrsisreccs  dtcation  iocreass  »itb  retical  ssp- 
aiation  for  ons  of  tbs  focr  sci^rets  (If).  As 
noted  eailtsr.  tbs  iodhidual  diffsreocs  in  tbs 
tanabsHiy  of  tbs  duration  of  tisibis  pssistsnce 
aaoss  triab  may  reSset  changs  in  tbs  sub- 
jretite  threshold  criterion  or  changs  in  the 
underi)ing  vbual  rsponse. 

EXPERUIENT  2 

In  the  pretious  experiment  us  found  that  for 
all  subjects  the  mean  duration  of  risible  persist¬ 
ence  increased  uiih  the  distance  separating  the 
successive  stimuli  and.  for  one  subject,  the  vari¬ 
ability  of  persistenoe  duration  also  increased 
Hith  the  spatial  separation.  This  result  is  con¬ 
sistent  with  previous  studies  that  used  different 
experimental  paradigms  for  estimating  the  dur¬ 
ation  of  risible  persistence  (Allport.  1968.  1970: 
Efron  &  Lee,  1971).  These  previous  studies  have 
not  reported  limits  to  the  increase  of  persistence 
duration  uith  spatial  separation.  Nonetheless,  it 
seems  reasonable  to  assume  that  there  is  both  a 
minimum  and  maximum  duration  of  visible 
persistence.  In  order  to  place  bounds  on  the 
increase  in  persistence  duration  with  spatial 
separation,  we  conducted  a  second  experiment 
and  estimate  the  duration  of  visible  persistence 
over  a  wider  range  of  spatial  separations. 

Method 

Subjects  The  same  four  observers  who  par- 
tidpated  in  the  first  experiment  (E\V.  DP,  JG 
and  JF)  served  as  subjects  in  this  expenment. 

Sttmuli.  As  in  the  previous  expenment, 
the  stimuli  differed  m  the  distance  between 


sascessnr  Eces  Ax  tbe  rime  csstal  s^oncri^ 
tb:  ssscesshr  Sees  dr  and  tbe  toca]  nrrtber 
of  Eses  tbit  were  pcesecred.  X.  Tbe  rtrmber  of 
Eses  IX)  was  25.  13.  9.  7.  5.  ■*  or  3  for  Ax 
amesfODiiss  to  0JD6.  0.12.  O.IS.  0.24.  026. 
0.4S  or  0.72d^  vassal  ts^e.  re^ieetrvely.  Tbe 
Ens  were  rSspiiced  mer  a  total  path  tsretb  of 
IjUdeg.  AO  oiber  zspeas  of  rim  stessB  were 
identiral  to  Beps  I. 

froce&ae.  Eads  expcrinar.ial  sesson  con- 
stsred  of  two  or  three  blodcs  of  2S0  trialSw 
WTrion  each  bJoek  of  trials,  cadi  Ax  was  pre¬ 
sented  40  rimes.  Tbe  40  repetitions  were  separ¬ 
ated  into  two  staircase  comEtioss.  Tbe  230  trials 
were  arranged  in  a  raialom  order  of  presen¬ 
tation. 

One  observer  viewed  6  biodcs  of  trials  in  tw  o 
separate  expcrinsenial  sessions,  arsotber  ob¬ 
server  viewed  4  blods  of  trials  in  two  separate 
sessions,  and  two  observers  viewed  3  blocks  of 
trials  in  a  single  experimental  session.  Observers 
rested  between  blocks  of  trials. 

As  in  the  previous  experiment,  two  inter¬ 
leaved  random  staircases  were  used  to  distribute 
the  data  around  a  71%  threshold  criteria.  De¬ 
pending  on  the  subjects  response,  the  temporal 
separation  was  adjusted  such  that  71%  of 
the  time  the  X  successively  presented  stimuli 
appeared  to  be  simultaneously  present.  The 
complete  data  set  can  then  be  used  to  estimate 
psjehometric  functions  for  each  condition  of 
spatial  separation. 

Results  and  discussion 

As  in  the  previous  analysis,  we  assume  that 
'he  probability  that  observers  will  report  that 
the  X  successive  lines  appear  to  be  simul¬ 
taneously  present  is  given  by  equation  (I). 
Again,  using  the  maximum  likelihood  pro¬ 
cedure,  we  estimated  the  values  of  r  and  a  that 
maximized  the  match  between  the  predicted  and 
the  observed  response  probabilities  for  each 
observer.  Ax.  and  for  all  values  of  Ar  reached  by 
the  staircases. 

Figure  4  shows  the  estimated  mean  r  and 
standard  deviation  a  of  persistence  duration 
plotted  as  a  function  of  the  distance  Ax  for  each 
of  the  four  subjects.  Of  the  28  estimated  normal 
distnbutions.  only  one  would  be  rejected  by  y’ 
at  P  <0.05.  As  in  the  previous  expenment,  we 
found  that  over  a  limited  range  of  spatial  separ¬ 
ations  the  mean  duration  of  visible  peisistence 
increases  with  spatial  separation.  In  addition, 
we  found  that  for  three  of  the  four  subjects 
(EW,  JF.  JG),  the  mean  duration  of  visible 
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perusteace  derhcd  front  Expt  1  arc  plotted  as  squares. 


persistence  approaches  a  maximum  (asymptote) 
value  at  the  larger  spatial  separations.  The 
fact  that  the  duration  of  visible  persistence 
approaches  a  maximum  value  at  large  spatial 
separations  suggests  that  the  mechanism  by 
vshich  the  visual  system  modulates  the  duration 
of  visible  persistence  operates  over  small  spatial 
separations. 

Figure  4  also  shows  that  the  variability  in  the 
duration  of  visible  persistence  increases  with 
spatial  separation  for  three  of  the  four  subjects 
(DP,  JF,  JG)  and  that  the  variability  is  greater 
at  large  S|>atial  separations.  Most  theories  of 
persistence  would  predict  a  correlation  of  r  and 
a.  For  example,  if  the  slope  of  the  decaying 
visible  persistence  were  to  decrease  over  time, 
any  variability  in  the  threshold  criteria  for 
visibility  would  have  greater  effects  at  longer 
persistence  durations.  The  variability  in  the 
persistence  estimates  for  large  separations  is 
substantial,  however,  particularly  for  subjects 
JF  and  DP.  This  result  reduces  our  confidence 


in  the  persistence  estimates  for  large  spaurl 
separations. 

Finally,  Fig.  4  shows  the  mean  and  standard 
deviation  of  persistence  duration  estimated 
from  the  results  obtained  in  Expt  1.  The  esti¬ 
mates  obtained  from  Expt  I  are  based  on  stimu¬ 
lus  conditions  in  which  the  number  of  successive 
stimuli,  JV,  varied.  The  estimates  obtained  from 
expt  2  are  based  on  stimulus  conditions  with 
constant  JV.  Despite  these  differences,  the  mean 
persistence  durations  measured  in  the  two  ex¬ 
periments  fall  within  the  variability  in  persist¬ 
ence  duration  for  each  condition  of  spatial 
separation. 

EXPER1ME.\T  3 

In  the  previous  experiments,  we  were  able  to 
esfimate  the  mean  t  and  the  variability  a  of  the 
duration  of  visible  persistence  of  a  bnefly  pre¬ 
sented  visual  stimulus  as  a  function  of  the 
distance.  Ax,  separating  that  stimulus  from 
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Other  s&ssS  tlai  ocesr  bin'  ia  &s£.  Vie  foaad 
ibst  lb:  Q&naied  pnssmxs  dioauta  :  ia- 
cram  with  aad.  for  3  <^4  soS^ats,  so  dos 
e.  Vie  iampta  ib:  araae  dcraiioa  :  as  ib: 
lbs*  daring  ttbidi  lb:  respoase  lo  a  stsnalas 
renains  abos'C  a  6xei  ibrobi^-daibc  foUoar- 
ing  sations  of  this  papa  «:  eiamin:  lb:  impli- 
caiions  of  ibe  enpii^  6cd>ngs  in  imns  of 
more  formal  modds.  To  simpUfy  our  asaly^ 
we  consda  only  expcctal  valun  arrd.  for  ibe 
lime  bang,  we  ignore  sntblxlity. 

Tbe  resulis  discussed  ibus  far  may  be  uier- 
preud  in  lenns  of  two  types  of  modds.  In  one 
type  of  model  the  shape  of  the  actual  tonpoial 
response  depends  on  norby  stimuli.  For  ex¬ 
ample,  the  preserKe  of  an  adjacent  stimulus  may 
increase  the  rale  of  daay  of  the  mponse  (see 
Fig.  5a).  In  a  simple  exponoiual  astern  this  can 
be  inlcrpreied  as  a  reduction  in  time  constant. 
We  will  call  this  type  of  model  the  rare  of  decay 
model.  In  the  second  type  of  model,  the  shape 
of  the  temporal  response  may  be  insariant,  only 
its  amplitude  is  reduced  by  the  presence  of 
adjacent  stimuli  (see  Fig.  5b).  We  will  refer  to 
this  type  of  model  as  the  gain  model.  The  rate 


a  rate  ot  decay 


b  gain 


Fig  S  Hypothetical  niechan»nis  for  modulaimg  the  dur* 
atton  of  vtiiblc  peniueitcc  (a)  The  fate  of  decay  model 
anumes  that  the  prereitce  of  an  adjacent  stimulus  increases 
the  rate  of  decay,  and  therefore  the  shape,  of  the  temporal 
response  (h)  The  gam  model  assumes  that  the  shape  of  the 
temporal  response  is  invananl,  only  its  amplitude  is  reduced 
by  the  presence  of  adjacent  stimuli 


of  dony  modd  pbes  oo  consmsnt  on  tb: 
slnpe  of  tb:  tsnpoml  n^xtns:  whidi  can  snry 
with  tb:  prcssic:  of  adjscott  stimuH.  Tb:  gain 
modd  constrains  lb:  shape  of  tb:  tsnporal 
rmponse  to  b:  invariant  aral.  tbodbre.  sqxtr- 
abl:  from  tb:  ioSuarc:  of  adjaont  stimuB.  In 
tb:  sections  that  follow  wc  orplore  tb:  extoil 
to  which  a  gain  modd  can  account  for  tb: 
inSuoyc:  of  adjacent  stimuli  on  tb:  duration  of 
visible  posistaic:.  We  first  consida  a  more 
formal  modd  of  subjans*  poformanc:  and  tboi 
docrihe  an  cxpoimotl  to  addrss  this  issue 
onpiricany. 

Ld  us  doiote  the  visibility  at  um:  r  due  to  a 
stimulus  with  intotsily  /  presoitol  at  time  r  eu  0. 
r^f.f).  As  bdbre.  Ax  reprcsotls  the  spatial 
sqtarafion  of  adjacoil  stimuli.  For  simplidty 
wa  assrnn:  that  c  is  monotomcally  decreasing 
(domying)  in  time  and  monotonically  increasing 
with  luminance.  Tbe  value  of  visibility,  r.  is 
usol  by  the  subjects  lo  make  a  decision 
about  the  prcsoicc  of  a  visible  stimulus  at  each 
location. 

An  implicit  assumption  undalying  our  data 
analysis  thus  far  is  that  the  stimulus  is  visible 
whoinar  v  was  large  enough  to  exceed  a  fixed 
threshold  c.  The  esumation  of  the  visible  persist¬ 
ence  from  the  results  of  Expts  I  and  2  amounted 
to  estimating  z,  such  that 

=  (2) 

The  esumatc  of  mean  persistence  duration  .  or 
simply  r«  as  a  function  of  Ax  and  Ar  for  a 
constant  v^Iue  of  luminance  I  v.’as  justified  to  the 
extent  the  entenon  c  is  independent  of  Ax  and 
Ar,  I  e.  that  the  stimulus  is  visible  whenever  the 
vmbihty  function  r  is  greater  than  a  fixed 
threshold  value,  c,  and  that  c  is  constant  for  all 
Ax  and  At. 

The  gain  type  of  model  is  based  on  the  idea 
that  the  distance  separating  successive  stimuli 
affects  only  the  amplitude  of  the  underlying 
temporal  response,  v.  The  amplitude  of  the 
response  is  likely  to  depend  on  the  stimulus 
luminance  as  well.  Therefore,  in  order  to  de¬ 
velop  a  gain  type  of  model,  it  is  necessary 
to  separate  the  effects  of  luminance  and  the 
effects  of  spatial  separation  on  the  temporal 
response,  v.  To  ao  this,  we  first  examine  the 
effects  of  luminance  on  estimates  of  persistence 
duration. 

Method 

An  expenment  to  test  the  effects  of  luminance 
was  performed  The  method,  apparatus,  pro- 
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6.  EgH.utcd  laao  (toltd  QTsbob)  asd  suedard  doutSoo  (open  Q^nbob)  of  penasteoce  dunuoa 
ptoted  as  a  fuacuon  of  Ac  dtsunce  separatism  sucoessi^'c  stiseuii  u>ih  stimulus  ictensity  as  a  parasatw. 
Stimulus  istemity  ts  ^ed£ed  as  a  fraction  of  the  reference  tctenssi>  (see  SturrJi  for  Expi  1). 


cedure  and  paradigm  uere  identical  to  that  of 
Expis  1  and  2  except  that,  in  any  gi^*en  trial,  the 
luminance  of  the  briefly  presented  line  w'as  0.72. 
1.48  or  3.2  times  the  reference  intensity  (see 
Siimttli  in  Expt  1)  and  the  spatial  separation  of 
$ucces$i\ely  presented  lines  was  0.06, 0.12, 0.36 
and  0.72  deg  visual  angle.  The  distance  between 
the  centers  of  adjacent  pixels  was  00193  cm  in 
both  the  vertical  and  horizontal  direction. 

Results 

I 

In  Fig.  6,  the  estimated  means  and  standard 
deviations  in  visible  persistence  are  plotted  as  a 
function  of  spatial  separation  with  stimulus 
luminance  as  a  parameter  for  the  tw'o  subjects, 
JF  and  DP.  Figure  6  shows  that  there  were  no 
systematic  effects  due  to  stimulus  luminance. 
Differences  in  the  mean  duration  of  visible 
persistence  due  to  stimulus  luminance  are  small 
and  inconsistent  and  can  be  explained  by  the 
variability  of  persistence  duration;  for  each  con¬ 
dition  of  spatial  separation,  the  mean  duration 
of  visible  persistence  estimated  for  a  stimulus  of 
a  given  luminance  value  falls  within  the  stan¬ 
dard  deviation  of  the  persistence  durations  esti¬ 
mated  for  stimuli  presented  m  any  of  the  three 
luminance  values.  The  results  of  this  expenment 
can  be  described  very  simply:  the  persistence 
estimates  are  invanant  with  respect  to  1 :4  lumi¬ 
nance  changes 


Dueussion 

The  cisibilily  inlerion  depends  on  peak  lisibd- 
uy.  The  goal  of  the  following  discussion  is  to 
examine  how  well  the  data  can  be  accounted 
for  by  a  model  that  assumes  that  the  visibility 
of  a  briefly  presented  line  can  be  represented 
as  a  product  of  three  different  functions  depend- 
I  ig  on  luminance,  distance  and  time,  respect¬ 
ively.  We  begin  by  noting  that  brighter  flashes 
do  not  persist  longer  than  dim  flashes.  This 
result  suggests  that  the  entenon  c  depends  on 
luminance  in  the  same  manner  as  does  the 
visibility  r.  In  other  words,  the  results  are 
consistent  with  the  hypothesis  that  entenon 
IS  a  threshold  defined  in  terms  of  a  fixed 
fraction  of  the  initial  amplitude  of  the  visual 
response  at  time  (  =  0  which  is,  in  turn,  a 
monoior.ically  increasing  function  of  the  maxi¬ 
mum  luminance. 

We  can  express  the  notion  of  a  relative 
entenon  that  is  determined  by  the  bnghtest 
stimulus  on  a  given  tnal  formerly  as  follows 
Let  /,  be  the  luminance  of  the  bnghtest, 
bnefly  presented  stimulus  line  on  a  given  tnal 
Another  stimulus  line  presented  with  luminance 
/  on  the  same  trial  will  be  visible  after  a  delay 
rif: 

(3) 
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where  e  is  a  moaoionically  ixicreasing  fuoctioo 
of  the  maximum  luminance. 

SepcrabUiiy  of  luminance  and  distance  effects. 
The  threshold  criterion  e  is,  as  before,  assumed 
to  be  independent  of  the  spatial  and  temporal 
stimulus  parameters.  Ax,  Ai^rAt  ihe  ^iribility 
threshold,  the  inequality  (3)  becomes  an  equal- 
ity  and  vce  can  diride  both  rides  of  this  equation 
by  the  threshold  e.  The  resulting  ratio  r/c  =  1  is 
independent  stimulus  luminance.  Consider  trials 
uhere  all  stimuli  are  presented  with  the  same 
luminance  /.  Then  /  =  the  ratio  r/c  can  be 
used  to  define  a  new  function  w: 


m  ’ 


(4) 


which  docs  not  depend  on  the  luminance  In-el. 
We  have  already  defined  «•  to  be  independent  of 
luminance  at  threshold.  If  ue  further  assume 
that  H-  is  independent  of  luminance  above  the 
threshold,  then  the  visibility  ii  can  be  written  as 
a  product  of  two  functions; 


=  (5) 

where  c  is  a  monotonically  increasing  function 
of  luminance,  /.  and  w  is  a  monotonicaily  de¬ 
creasing  function  of  r  and  increasing  in  Ax.  Thus 
0  IS  a  separable  function  of  luminance  and 
another  function  w  that  depends  on  lime  and 
separation.  Note  that  the  function  h-  is  indepen¬ 
dent  of  luminance  and  embodies  the  dependence 
of  peisistcnce  on  spatial  separation  Ax. 

Separability  of  time  and  distance  m  a  gain 
control  model.  With  this  framework  at  hand,  we 
are  ready  to  formalize  the  assumption  under¬ 
lying  the  gam  type  of  model.  In  that  model,  the 
presence  of  adjacbnl  stimuli  only  modulates  the 
magnitude  of  the  response  That  is,  the  function 
w  itself  can  be  separated  into  a  product  of  two 
functions,  gain  g,  and  temporal  response  A,  as 
follows; 


»i.(t)  =  g(Ax)A{r); 

and  the  visibility  function  can  be  wntten  as 
vJ,l.i)  =  c(l)g(lsxMi).  (6) 

The  separability  of  lime,  distance  and  lumi¬ 
nance  expressed  in  equation  (6)  predict',  that  a 
decrement  in  luminance,  could  completrly  com¬ 
pensate  for  a  corresponding  increment  in  separ¬ 
ation  Ax.  Alternatively,  a  decrease  in  the 
visibility  due  to  small  spatial  separation  can  be 
compensated  by  an  increase  in  visibility  with 
luminance.  Expenment  4  was  aimed  at  discover¬ 
ing  the  relationship  between  spatial  separation 
and  luminance.  If  wc  know  how  the  amplitude 


of  the  visual  response  dianges  with  luminance, 
and  know  how  lununance  and  spatial  separ¬ 
ation  trade-off  in  determining  the  durauon  of 
visible  persistence,  then  we  can  derive  how  the 
ampUtude  of  the  visual  response  changes  with 
spatial  sepaiauon. 

EXPEIU.MENT  4 

Experiment  4  tests  the  extent  to  which  the 
gain  type  of  model  bolds  and  thereby  yields 
more  informauon  on  the  temporal  response,  h. 
The  approach  is  based  on  the  measurement  of 
a  trade-os'  betwaa  the  funaion  of  luminance, 
c(/),  at.-  the  function  of  separation,  g(Ax). 
Since  ndiher  c(/)  or  g(Ax)  depend  on  At  [i.e. 
they  are  separable  from  A(r)],  w'c  investigated 
the  eSccts  of  luminance  and  spatial  separation 
when  Ar  =  0. 

Method 

Subjects.  Tiie  same  four  observers  who  par¬ 
ticipated  in  the  previous  experiments  (EW,  DP, 
JG  and  JF)  served  as  subjects  in  this  experiment 
as  well. 

Stimuli.  As  in  Expt  2,  the  stimuli  consisted  of 
two  sets  of  vertical  lines  presented  0.12  deg 
above  and  below  a  fixation  point  (see  Rg.  2).  In 
fact,  the  stimuli  were  equivalent  to  the  stimuli  in 
Expt  2  with  the  following  exceptions.  Rather 
than  present  the  lines  successively,  the  lines  were 
presented  simultaneously.  In  addition,  the  in¬ 
tensity  of  each  line  was  varied  as  a  function  of 
the  position  of  the  line;  across  a  row  of  vertical 
lines,  the  intensity  of  each  line  decreased  expo¬ 
nentially  with  stimulus  position  as  illustrated  in 
Fig.  7.  Let  I,  be  the  intensity  of  a  line  in  position  n 

\ 

Intensity 

I 


Fi;  7  ‘Hie  d»pUy  for  Expt  4  two  sets  of  vertical  lines  uere 
simultaneously  presented  above  and  below  a  fixation  point 
TIk  height  of  each  vertical  line  represents  stimulus  lumi¬ 
nance  wluch  deaeased  exponentially  with  stimulus  position 
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The  intensity  of  the  line  in  the  leftmost  (or 
rightmost)  position.  /„  was  initialized  to  the 
reference  intenrity  (see  Stiina/i  in  Eapt  1).  The 
intensity  of  the  line  in  a  position  to  the  ri^l 
(or  left)  of  n.  was  Ifa  where  a  is  the 
slope  of  the  exponential  itcRZSS:  On  each  trial, 
the  direction  of  the  exponential  decrease  in 
intensity  (left-to-right  or  right-to-left)  of  Knes 
presented  above  the  fixation  point  was  chosen 
randomly;  the  intensity  of  the  lines  below  the 
fixation  point  decreased  exponentially  in  the 
opposite  direction.  The  spatial  separation  bx  is 
varied  by  increasing  n  over  a  range  of  3-25  as 
in  Expt  2. 

Procedure.  The  subject  initiated  a  trial  by- 
pressing  a  response  key.  After  600  msec,  the 
stimuli  were  Hashed  for  1  msec.  At  the  end  of 
each  trial,  the  subject  pressed  one  of  two  re¬ 
sponse  keys  to  indicate  whether  or  not  all  the 
vertical  lines  above  and  below  the  fixation  point 
were  visible.  Subjects  were  instructed  to  use  the 
same  criterion  for  visibility  that  they  used  in  the 
previous  experiments,  subjects  were  to  respond 
"yes”  if  they  perceived  a  grating  composed  of  all 
the  lines  above  and  below  the  fixation  point  and 
to  respond  "no”  otherwise. 

At  the  beginning  of  each  session,  subjects 
repeated  280  stimulus  trials  from  the  previous 
experiment.  These  280  trials  served  to  remind 
subjects  of  the  visibility  criterion  used  in  previ¬ 
ous  experiments  and  to  encourage  them  to  use 
the  same  visibility  criterion  in  this  experiment. 
Subjects  then  viewed  3  blocks  of  trials,  each 
block  consisting  of  160  trials  Subjects  rested 
between  blocks  of  tnals. 

Across  the,  three  blocks  of  tnals,  each  con¬ 
dition  of  spatial  separation  bx  was  presented  60 
times.  The  60  repetitions  were  presented  within 
two  interleaved  staircases.  The  total  480  tnals 
resulting  from  the  product  of  the  7  Ac,  the  60 
repetitions  per  bx,  and  the  2  staircase  con¬ 
ditions  were  presented  in  random  order. 

The  rate  of  the  exponential  decrease  m  stimu¬ 
lus  iniensily  a  was  controlled  by  a  modified 
up-down  staircase  The  starting  value  of  a  was 
099  If  the  subject  responded  “yes”,  a  was 
decreased  by  0.01  and  this  new  a  was  stored  for 
the  next  presentation  of  this  staircase.  If  the 
subject  responded  "no”  for  two  repetitions  of 
the  same  stimuli,  a  was  increased  by  0  01  and 
this  new  a  was  stored  Under  the  assumption 
that  a  is  a  normally-distributed  variable,  the 
staircase  procedure  converges  to  the  a  for 
which  71%  of  the  time  all  the  n  lines  are  visible 
to  the  observer  for  each  condition  of  spatial 


separauon,  bx.  All  the  data  were  used  to 
estimate  the  entire  psychometric  functions. 

Results  and  discussion 

Psychometric  functions  relating  the  probabil¬ 
ity  of  reporting-  that  all  n  lines  were  simul¬ 
taneously  visible  to  the  relative  intensity  of  the 
dimmest  line  were  calculated  for  each  subject 
and  each  condiuon  of  spatial  separation.  Ax.  In 
Rg.  8.  the  relative  intensity  of  the  dimmest  line 
(expressed  as  the  normalized  ratio  of  the  mini¬ 
mum  and  maximum  line  intensities)  accom¬ 
panying  50%  response  probabilities  is  plotted  as 
a  function  of  the  spatial  separation  for  each 
subject.  As  Hg.  8  shows,  the  relative  line  inten¬ 
sities  required  for  all  n  lines  to  appear  to  be 
visible  decreased  with  spatial  separations  up  to 
0.24  deg  of  visual  angle.  For  larger  spatial  separ¬ 
ations,  the  relative  line  intensities  required  to 
sa  all  n  lines  do  not  vary  systematically  and 
therefore  we  conclude  that  the  intensities  are 
independent  of  spatial  separation 

The  results  of  Expt  4  can  be  interpreted  in 
terms  of  the  gain  control  model.  In  particular, 
considering  the  form  of  the  visibility  function  v 
given  by  equation  (6)  we  set  r  =  0  and  interpret 
Expt  4  as  finding  values  of  the  dimmest,  A-th 
line  /v  for  each  Ax  such  that' 

c(f,v(Ax))g(Ax)A(0)  =  c(/|),  (7) 

where  /,  IS  the  first  (bnghtest)  line.  There  are 
three  unknown  functions  in  this  equation  c,  g 
and  h  and  our  goal  is  to  determine  h  We  do  that 
in  two  steps.  First,  we  use  previous  information 
on  intensity  scaling  to  assume  a  reasonable  form 
for  the  criterion  function  c.  We  then  combine 
the  results  of  Expts  1,  2  and  4  in  order  to 
eliminate  g. 

The  cnterion  function  c  represents  the  observ¬ 
ers'  adjustments  to  changes  in  luminance.  To 
proceed  with  our  analysis  we  need  to  make  an 
additional  assumption  about  the  function  c(/) 
in  particular,  we  assume  c{l)  to  be  a  power  law. 
This  assumption  is  consistent  with  at  least  two 
empincal  considerations.  First,  the  classical 
scaling  data  derived  from  magnitude  estimation 
experiments  (Stevens,  1957)  suggests  that  per¬ 
ceived  bnghtness  is  a  linear  function  of  lumi¬ 
nance  raised  to  a  power.  Second,  the 
assumption  is  consistent  with  the  luminance 
invanance  observed  in  Expt  2. 

Substituting  /*  for  c  in  equation  (7)  yields. 

/((Ax)g(dx)  =  eoif. 
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where  c,  is  a  constant  [incorporating  A(0)1. 
Taking  logarithms  of  both  sides  yields  the  fol¬ 
lowing  criuation  telating  Ax  and  /: 

log(g(Ax))  =  log(c,)-plog[^^]-  («) 

This  equation  represents  the  relationship  be¬ 
tween  two  functions  of  the  stimulus  separation 
f,(Ax)  and  g(Ax^.  Our  primary  goal  is  to  use 
the  equation  (8)  to  combine  the  results  of  Expt  4 
\yilh  those  of  the  earlier  experiments  and 
directly  evaluate  the  shape  of  the  temporal 
response,  h.  It  is  also  possible,  however,  to 
examine  whether  there  exist  plausible  gam  func¬ 
tions  g  consistent  with  both  equation  (8)  and  the 
results  of  Expt  4.  In  order  to  find  such  a  g  we 
first  determined  a  functional  form  for 
While  there  are  many  different  functions  con¬ 
sistent  with  the  empirical  constraints  on  I,  we 
selected  the  following  spatial  weighting  function 
generated  by  taking  the  difference  between  two 
Gaussian  functions: 


(f  =  2)  Gaussians,  respectively,  and  where  is  a 
Gaussian  density  function  of  the  form; 


MAX). 

/, 


/4,<J(Ax,  (T()-/l!0(Ax,O2):  (9) 


where  d,>  0  are  the  amplitudes,  <r,>0  standard 
deviations  of  the  positive  (i  =  1)  and  negative 


centered  at  the  origin.  This  type  of  spatial 
weighting  function  seemed  plausible  because, 
given  the  correct  parameters,  it  has  l«en  used 
to  describe  other  spatial  interactions  including 
empincalty  observed  receptive  fields  in  m^key 
and  cat  retinal  ganglion  cells  (Enroth-Cugel 
&  Robson,  1966).  The  difference  beti^n 
two  Gaussian  functions  has  also  been  used  to 
approximate  psychophysically  defined  spatial 
weighting  functions  (e.g.  Schade,  1956;  Wilson 
&  Bergen,  1979;  Graham,  1980). 

The  best-fitting  parameters  to  equation  p) 
were  derived  for  each  subject  using  an 
fitting  pro  idurefSTEPIT,  Chandler,  1965)  that 
mmimired  the  squared  error  between  each  sub- 
Kcfs  data  and  equation  (1 1).  The  resulting  ms, 
shown  in  Fig.  8,  are  quite  reasonable.  Ue 
root-mean-square  error  for  the  fits  is  0.05, 0  013, 
0.058  and  0.025  for  subjects  JG,  DP,  EW  and 
JF,  respectively. 

Since  the  dependence  of  the  luminance  on  Ax 
can  be  characterized  as  a  difference  of  two 


Joyce  E.  Famox  et  aL 


9^ 

Gaussian  functions  then  the  resulting  gain  func¬ 
tion  g,  shown  in  Rg.  8,  is  also  a  difference  of 
Gaussians  but  raised  to  a  positise  power  fi.  This 
function: 

g(Ax)  =  Ai4i(kx,  e{ff: 

where  *  is  a  positis'e  constant,  appears  to  be  a 
reasonable  reflection  of  the  effect  of  spatially 
adjacent  stimuli  on  persistence.  According  to 
these  results,  the  width  of  the  effecike  field 
within  which  one  stimulus  line  affects  the  per¬ 
sistence  of  another  is  approx.  0.24  deg  of  visual 
angle.  Since  the  form  of  the  gain  function 
appears  to  be  reasonable  we  proceed  to  use  the 
data  from  Expt  4  to  derive  the  temporal  d^n- 
dency  h.  Note  that  the  following  derivation  is 
independent  of  the  form  of  the  gain  function. 

Derivation  of  femporat  dependency.  Assuming 
thm  the  gain  model  holds,  the  temporal  wave¬ 
form  of  the  underlying  visual  response  to  a 
briefly  presented  visual  stimulus  is  embodied  in 
the  function  h.  To  evaluate  h  we  need  to  elimi¬ 
nate  g  in  equation  (7).  We  accomplish  that  by 
substituting,  in  equation  (7),  the  expression  for 
g  from  equation  (8).  Empirically,  this  amounts 
to  combining  the  results  of  Expt  4  with  those  of 
Expts  1  and  2. 

To  combine  equations  (4)  and  (8)  we  first  take 
the  logarithm  of  both  sides  of  equation  (7), 
and  then  solve  for  g  with  the  result 
loglgfA*)]  =  lo#)  -  Then,  substitut¬ 

ing  for  log[g(A.x)l  in  equation  (8)  yields; 

logli)  -  log[h(t)l  =  log(Co)  +  P  j^7 

which  can  bd  simplified  to: 

log(/i(t)l  =  <:  +  /!logj^^j;  (10) 

where  it  is  a  real  constant.  To  estimate  the 
temporal  decay  function  h  consistent  with  our 
results  can  be  accomplished  by  finding  a  func¬ 
tion  of  t  which  is  linear  in  log(//t,J. 

For  each  subject  and  each  condition  of  spatial 
separation,  log{(/U  was  estimated  by  the  50% 
threshold  criteria  of  psychometric  functions  re¬ 
nting  the  probability  that  the  subject  responded 
“yes”  (to  indicate  that  all  stimuli  were  visible)  to 
the  ratio  of  the  minimum  and  maximum  stimu¬ 
lus  luminances,  ///«.  Figure  9  shows  log((/U 
plotted  as  a  function  of  log(l/r)  (derived  from 
the  data  collected  in  Expt  2)  for  each  subject. 
The  solid  lines  in  Fig.  9  illustrate  that  the 
following  equation  provided  a  reasonable  fit  to 


the  data  for  spatial  separations  less  than  or 
equal  to  0.24  deg  of  visual  angle: 

log(i)=k-l-ulog[g.  (11) 

We  can  therefore  conclude  that,  to  the  extent 
that  this  equation  is  supported  by  the  data,  the 
gain  model  cannot  be  rejected  (at  least  for  small 
spatial  separations)  and  that  the  decay  of  visible 
response  has  the  genera!  form  !//.  This  function 
nught  not  be  a  realistic  impulse  response  for  a 
linear  system,  but  it  does  indicate  that  the  decay 
of  visible  persistence  is  slower  than  a  simple 
exponential  (cf.  Rumelhart,  1969;  Hawkins  & 
Shulman,  1979;  DiLollo,  1984). 

rmally.  Fig.  9  shows  that  for  larger  valuw 
of  T  (corresponding  to  stimulus  conditions  in 
which  Ax  was  greater  than  0.24  deg  of  visual 
angle)  there  seems  to  be  systematic  departure 
from  the  straight  line.  This  represents  the  failure 
of  the  model  to  capture  spatial  interactions  over 
latter  separations. 

GENERAL  DISCUSSION 

Persistence  is  a  property  of  any  linear  system 
with  limited  temporal  bandwidth.  Usually,  the 
narrower  the  bandwidth  the  longer  the  persist¬ 
ence.  Similarly,  the  more  veridical  the  temporal 
response  of  a  system  is,  the  IesST>CTSiswricrthcre 
is.  In  any  sensing  system,  there  is  a  irade*off 
between  the  ability  to  reproduce  the  temporal 
properties  of  a  stimulus  (achieved  by  broad 
temporal  bandwidth  and,  consequently,  short 
persistence)  and  the  ability  to  detect  the  pres¬ 
ence  of  a  weak  stimulus  in  the  presence  of  noise 
(achieved  by  temporal  summation  and,  conse¬ 
quently,  long  persistence).  There  arc  many  situ¬ 
ations  in  which  the  visual  system  sacrifices 
temporal  bandwidth  in  favor  of  stimulus  sensi¬ 
tivity.  For  example,  the  time  constant  of  tem¬ 
poral  integration  is  more  than  two  times  longer 
in  the  dark  adapted  eye  than  in  the  light  adapted 
eye.  (Sperling  &  Sondhi.  1968).  We  report  an 
instance  in  which,  depending  on  the  spatio- 
temporal  properties  of  the  stimulus,  the  visual 
system  sacrifices  either  temporal  bandwidth  or 
stimulus  sensitivity.  When  the  distance  between 
successive  stimuli  is  small,  as  in  the  case  of  the 
apparent  mriion  of  a ‘Single  object,  the  visual 
system  sacrifices  stimulus  sensitivity  in  favor 
of  temporal  fidelity,  preserving  the  temporal 
stimulus  information  and  reducing  the  smear 
that  would  otherwise  be  generated  by  moving 
objects  (Burr,  1980).  When  the  distance  between 
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Fig  9.  For  each  subject,  the  natural  logarithm  of  IJl  (the  ratio  of  maximum  ((„)  and  minimum  (/) 
stimulus  luminance  that  accompanied  the  $0%  vinbihty  threshold  enteru  in  Expt  d)  is  plotted  as  a 
function  of  1/r  (the  reciprocal  mean  duration  of  visible  persistence  estimated  from  the  results  of  Expt  2) 
The  solid  lines  represent  the  linear  regression  of  lnl//t,|  on  ln(Ur)  for  the  data  corresponding  to  conditions 
in  uhich  adjacent  stimuli  uere  separating  by  distances  less  than  or  equal  to  0  24  deg  visual  angle  The 
solid  circles  falling  near  Ihe  regression  line  from  left  to  right  correspond  to  spatial  separations  of  0  6. 0 12. 
O.IS  and  0  24  deg  visual  angle,  respectively.  The  unfilled  circles  correspond  to  spatial  separations  greater 
than  024  deg  visual  angle 


successive  slimuli  is  large,  as  in  the  case  of 
briefly  presented  stationary  objecis,  the  visual 
system  sacrifices, temporal  fidelity  in  favor  of 
stimulus  sensitivity,  allowing  more  time  to 
extract  the  spatial  information  necessary  for 
object  identification. 

We  consider  a  simple  gain  model  as  a  possible 
mechanism  for  modulating  the  duration  of  vis¬ 
ible  persistence  as  a  function  of  the  distance 
separating  stimuli.  In  this  model,  the  shape  of 
the  underlying  visual  response  is  preserved  and 
only  its  amplitude  is  modulated  by  the  presence 
of  adjacent  stimuli.  Our  analysis  does  not  as¬ 
sume  any  particular  shape  of  Ihe  temporal 
impulse  response  function.  We  only  assume  that 
a  briefly  presented  luminous  line  generates  a 
visual  response  that  decays  over  time,  and  (hat 
after  some  time  the  visual  response  reaches  a 
threshold  below  which  it  is  no  longer  visible. 
The  effective  duration  of  visible  persistence 
corresponds  to  the  duration  that  the  visual 
response  generated  by  the  stimulus  is  above 


threshold.  In  order  to  test  the  gam  model,  we 
make  Ihe  further  assumption  that  the  amplitude 
of  the  visual  response  lo  briefly  presented  stim¬ 
uli  increases  with  stimulus  luminance  and  that 
the  effects  of  spatial  separation,  luminance  and 
temporal  separation  on  visible  persistence  are 
separable.  The  trade-off  we  observed  between 
the  effects  of  spatial  separation  and  stimulus 
luminance  on  the  duration  of  visible  persistence 
supports  Ihe  assumption  of  separability  and 
the  gain  model  The  gain  model  is  appealing 
because  it  can  be  realized  by  mechanisms  under¬ 
lying  shunting  lateral  inhibition  (Sperling  & 
Sondhi,  1968). 

This  \^o!k  was  supported  by  the 
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American  Sign  Language  (ASL)xsa  gsstunl  form  of  com- 
municatloD  us^  by  the  North  American  deaf  and  l^ari:^ 
impaired  communiues.  In  free  conv-ersarioo.  ASL  is  as  rapU 
a  form  of  communication  as  most  stolen  lasgsags.  iQdud> 
ing  English  (BeUup  &  Fischer.  1972).  Over  the  past  decade 
there  ha\e  been  several  investigations  of  factors  r^ted  to  the 
transmisrion  of  ASL  over  the  existir^  long-distarice  commu¬ 
nications  ncpvorks.  The  problem  is  to  compress  a  video  ^sal 
of  the  signer  to  (he  extent  that  it  v^ill  fit  through  a  Io»- 
bandwidth  or  low  bii>rate  communication  channel,  such  as 
an  ordinary  telephone  fine,  without  greatly  disrupting  the 
eniciency  of  communication.  Although  previously  des^ned 
video  telephones  would  suffice  for  communication,  thdr 
bandwidth  requirements  and  cost  made  them  impractical. 
The  current  public  telq^hone  network  has  transmission  limits 
of 300  to  2800  Hz  for  analog  signals  and  nomiitally 9.600  tnis 
per  second  (bps)  for  digital  signals  (ca,  1988). 
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reqmremests  ke  ASL  wss  posed  by  S^aEs^  (1978).  wbo 
subseiysggtlv-cSaermioedtfcairtsartzbJysttrseia^fScocM 
convey  messagas  (^esEsg.  1930. 1931).  ^kxE:^ 

Losdy,  ssxj  Pavd  (19SS)  tavesrigsred  A!^  ts:d;^ 
btEty  v*ersus  bsddwsdtb  by  ttrisg  the  most  sopSasri- 

C3!^  yaiial  tmsge-compresssoa  zsd  cd<£^  schetaes  tb» 
av'aih^  Is  oae  coat&Don,  selects  wtre  abSe  to  isieqpea 
rignswith  oorTn3li2edtsidH^HE^‘(^,S6.tsrd2rio3tofc!3- 
bandwidth  sequtoces.  evm  though  beadwidtb  hod  bees  re¬ 
duced  to  2SS0  Hz.  Rdated  woric  by  Ataamaxic,  LesdSer.  aad 
Nadlcr  (1982)  «ift  French  Sgn  language  and  by  Pearson 
(1981)  with  British  Langu^  also  ofSns  the  posribility 
of  substantial  cempressioa. 

The  rctarive  success  in  ASL  compresrion  adiieved  by  Sper¬ 
ling  et  al.  (1985)  may  be  attribute  to  the  bige  amount  of 
redundant  ^atial  information  within  the  ASL  rignaL  Spelia] 
redundant’  exists  both  across  individual  pnxels  and  across 
groups  of  frixels.  Individual  {rixds  are  redundant  when  the 
gray  level  of  one  {rixel  ts  predictiv'e  of  the  gray  levd  of  nearby 
pixels.  Groups  of  ptxds  are  redundant  w1^  cues  in  one 
region  of  the  image  yield  predictions  of  what  should  appear 
in  other  regions.  For  example,  conrida  the  particular  coofig- 
uration  of  frixdsthat  yidds  the  form  of  an  ann.  To  the  degree 
that  the  h^td,  dbow,  and  shoulder  are  resolvable,  other  arm 
pixels  are  probably  unnecessary  and  are  therefore  redundant 
Accordin^y,  one  may  discard  some  forms  of  spatial  infor¬ 
mation  with  the  expectation  that  other,  rimilar  infonnaiion 
remains.  This  sort  of  ^tia!  redundancy  provides  the  baris 
for  the  often  surprising  success  of  dynamic  point-light  di^lavs 
(Cutting.  1978.  Johansson.  1973)  in  conv-eyingraiher  complex 
form  information.  Indeed.  Poizner.  Bellup,  and  Lutes-Dris- 
coll  (198l)demonstrated  that  such  displays  successfully  con- 
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bex3.  ISSO:  i.ls’cs  &  Z^sec.  I9S5).  Is  ocsx  issttscos. 

!»3ne.tb3:ss»i^sc;^)cniboibot!30cyib23aT3S(OT 
pecccftszl  c»)  2X  «SsKi%-  pBCshed  nnsd)  Elr  B3I&33 
(AKb.  1952:  HasSx.  I95S;  Jfcilsos.  HeirSddL  Bkxssa^ 
dtJo.  &  Ccssa.  I9S7).  Is»£x  2$  ettai  <23  he  pencted 
(bxed}.  ii  ihK  ihae  tsss  be  ocascesi  sasdss 

pxpenis  iha  tdka  the  evea  saoctoie  uiuaa  2  coooa 


scqseace. 

Hw  does  oae  Ioc2X  23d  cse  soth  s&selst  pxpettiet  foe 
tesipoesl  cosspxssos?  Isietcs&s^',  the  peobka  cf djoacoc 
setpBxcs^sxsaikn  hssbeen  sddeoted  ia  several  dostxas 
ofsrad)'23d  fora  varieiy  of  pstpeses.  These  isdode  cffoits 
to  cossuixt  bscaablLe  taotsoa  rcprascsraiiaa  sdietaes  (Man- 
&  Va32. 19S0:  Robin  &  Richards.  19SS),  the  devdopsient 
of  nulios  desenpusra  for  robotics  or  aitiScral  ioldlisencc 
Cniiba(fc20,19S3),  and,  2S  noted  prcsTOosli-.caoseatBibolioa 
and  event  perception  in  the  field  ofsodel  ps\rdio!o£y  (Hader, 
1958;  Nernson.  I973h  In  another  ASL  study.  Gim  (1982) 
attemprad  to  locate  the  boundaries  betueen  consecutive  sgns 
in  a  stream  of  ASL  itnases. 


otdar.pr-Tfpr.iaiesgSoeasniei.rdbsrta^sSvtSoweaed 
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t^detraecfeoesstccy.bcshKSbaasdbcseeasehitraa^ 
asovheaeihebocsdtmraiKrapihcedL 

t3aseagcfcapirramre3;.Ke»3cB33dEe8yug(l97f) 

Asaedthas^jeasneaaDcctseaaeaesodiarai^jaataithe 

hvr.Ti4>-4i.erfhrh»»fea3gggt(fmj>jra"«^i5ii-ijtr..ii^  --i 

beaeeathfK-raadieisCaattgJipeia^sqBegiTSthntbr 
Iirrat'pmectbad2b^depeeofpy3dliT^aHsa*r- eeTbey 
tcC3dtb33dtk3ae»effcr.mrs6»ccg:cit&3S2abgaS:- 
pcc3S»eseBaxe3ccs3adrdgerardftndda3eeseffosaes 

r;exr<Tr-tnflheeecrneSfe-!A-tg-atprIe«smerec-vire«-rF-Ti- 
ia  their  dsg^rtacsofflbe  2:033.  taardtirscgxsra  as  poet 

9^  meg  pSTf  an  rV*' 

^Sea  totftongggsfegKao^cfccdifl’i^sg^sEsmfaa 
mgrgsboqaagbggyoegscfiirrgfiTrars  laccectpeAogsl 
{Ne«t»a  &  Essqe^  1976X  poegs  of  th;ss  beraVpoea 
&?rgtwgsgg:gd»fe^CQngfg^a2qr»tbgfors;saocs 
sesrgjces  &»  £b:y  ««rs  ds^a.  la  s&oci,  Ne«:soo 
aad  tas  ccC:ioga:ocs  pcmi3s5  a  coena.^<Sgaocg;q;Spa 
thsi  ssi^ssmdv  <5c£aed  bcsslpoesx  £ag3S  coevsy  xaSxBs- 
tioo  tbss  is  of  f?s2:£r  ss^oeusce  to  ibg  sSobsl  pereps  tbsa 
ssabfgaSyy.is  (Fcf  a  resSear  of  the  refe  of  bcs2tpoto  ia 
cvrsapezc:pc>XLSegNe»tsoaes2L,  19S7)l 

ChccsiKg  SipdJkGKS  FrciKfs  Auiotnakdly 

la  ibe  p^ssax  s:t3dy.  «e  Bssscre  tbe  tatdSfSxS^*  of  ASL 
sequgacts  osssixsgd  by  oss$  a  sabsel  of  dsosca  bas^ 
fren  ibt  seqacoce.  Tbs  of  Ne^isoa  aad  bai 

cc^hborasces  ssspus  ihii  ta  segaasteg  ibe  sapsoas,  «c 
moo3d  da  besS  la  rezso  breakpoeals  <&C27d2^  n»- 
treskp^ts.  A!ibo^  ibc  Nf»isoa  ^goccdore  for  locaiiag 
breaS^osss  «t»da  i»t!2.  m  uscfc!z3£ss  la  a  resk-tixsc  coaas> 
laauoa  is  Snhsd,  because  betnaa  ohssn'Cis  must 
sdect  frsmssv.  Ohra  tbg  grouiag  av^ubbelity  of  d^Ul 
tm2gs>prD«ssicgted)DoSogy,  n  is  aatcra!  to  dipiize  tbc  im^ 
sequcaces  to  be  tiaasRuued.  to  compute  nfudt  fames  rep¬ 
resent  tmakpcHCts  uiUnn  tbe  sequeace.  and  to  traasmh  the 
cbosea  frames  and  discard  the  remainder.  To  implement  such 
a  system,  «r  must  fint  consider  tbe  (Ayvea!  psope:^  asso¬ 
ciated  «itb  breakpoints:  tbe  boundaries  of  perceptual  unitSL 


Perceptual  Units  of  Behavior 

A  technique  for  determining  the  location  of  boundaries  of 
peroeptua]  units  that  has  received  conriderable  attention 
comes  from  Kewir^n  (1973)  and  his  collaborators  (Neutson 
&  Engquist,  1976;  Neuison,  En^uist,  &  Bois*  1977;  Neutson 
&.  Rindner,  1979;  Rindotf.  1982).  Although  Ne^ison's  early 
work  was  concerned  with  determining  how  tbe  attribution 
process  varies  as  a  function  of  the  unit  of  percepuon,  he  was 
also  concerned  with  demonstrating  that  there  was.  in  fact,  an 
objective  baris  for  determining  units  of  behavior.  In  his  pro* 


Physical  Characteristics  of  Breakpoints 

Two  pc^'ble  theories  could  account  for  tbe  finding  that  an 
action  stream  may  be  partitioned  into  discrete  units  of  behav> 
ior  based  on  breakpoints  (Neutson  &  Engqmst,  1976).  Either 
tbe  particular  cottHguration  of  components  in  the  scene  con¬ 
stitutes  a  dutineshe  state  that  identifies  tbe  breakpoint,  or 
actions  are  defined  staiMo-staie  changes  that  are  charac¬ 
terized  by  suectsrive  breakpoints.  In  a  test  of  these  two 
posribilities.  Neutson  ct  al.  (1977)  used  a  movement  notation 
system,  designed  for  use  by  choreognphcis,  to  code  tbe  body 
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?osae«cf^aa^«btte5cgagaM.D5gfersgr<beg»ega 
c:»£:gi  ae  gg&rra  pccsss  rs  ttse  teaSe  tiae  pckmb 
c^aaijss  cf  tSae  aeas::  tbs  ss:£c£  scsK&ss  cf eic&  scQssses  ss 
qgts9d.^cccya:=tpcsaaac&aa>sbc»esasaecss^ 

hrairpfrrss.  he»gja  h:xA';njt<  »bA 

becxesa  aatefy  cbcsca  accibsai^^ 

(1977)  ssscSss  ^ca»Ka3sd  screes  ss;9an  fix  cbe  sass-:»> 
saas  fc^Tc^csa  beg  sbjaed  po  eti&aee  fee  tbs  <£clagj>s 
sase^TCTtgys. 

Tbergsgjscfftbessesprflsarasagegagrycgssitepsa 

fe«  ibg  aaarf  <»*?y  f"j>T T <«yVH  TV 

cf^srabes  cf  tbs  bebatikx;  at  sgifrrghg  tceaipttas  (Ne»v 
SCQ  e3  al.  1977,  ISSTX  To  accosts  sascs^&aactbodr 
posfi&es  as  tcsa^pocsa;.  soeae  Cxa  cf  cast  have 
cesgsisd  begifcgga  tcsatpcba:  iba  is,  t2bc  pcmca  cfeaags 
ctssoa  fcygpea  jagargrasoc^.  Xsatsoo  Cl  A  (1977, 19S7) 
caspcgcdtbsrgegsfeofposa5oo<£asgso>srtinyasaa 
isdacfgaoaragatcocsySeiSqr.Tlapg2SWE.astSeypocsad 
ccx.  s  n^atsd  to,  fcei  iSSsTcsx  firoo.  oovrocot  esssbads. 
Tbs  ispceus:  that  «s  dsive  frees  t£ss  «oek  is  that  soeas 

fixes  ^  d}TrgsAr  asc^ity  sest  occsr  bet«esa  saccss^r 
fccEaipcccsL  Tbs  cs^xncal  obsesvatSca  scppoeis  oer  ssl^> 
me  ispcTSsSoa  that  secs  £:e3:er>ibas-a%ecsss  aeaocai  of 
3em%  occcs  bet^eea  bocadariss  aad  ibax  tbs  acfr^i^r 
dssrsases  ax  boeadaries^ 

Evx2»ce  of  sod  acmity  tsssx  be  zvxi^tih  ia  tie  sorike 
cbanccsrsdsof  tbs  so^tasace:  Marr  aad  V&sa  (S9S0}  frx< 
saabzsd  aouaa  ia  ibesr  staxs'motioa'sxaxs  rrpcsecxatSoa 
fix  sepsecxias  a  streac  of  cxneccat  13X0  pKces  tbax  caa  be 
descnbed  iadspsadsaib’*.  They  used  passes  (described  as  tao- 
cxctt  «b£a  ^  pS3XS  of  a  shape  are  aiber  absolaidy  or 
rdaxh^  at  rest)  to  sc^sxct  a  zaorioa  siireaai.  Pauses  occur 
ubsa  the  ch^  (or  oS^ects)  in  the  sceae  uaderso  a  change  in 
direcbaa  of  cxneaxct  aad  occasioaally  occur  ax  other  lao- 
taeais  in  a  seqceace.  This  same  txnioa  appears  in  the  v^ork 
oS  Rchia  aad  Ridards  (1985).  %^bo  ai^ued  that  natural 
mouoa  boundaries  occur  at  starts,  stops,  aad  force  discoatio' 
taties.  M(»eo^tr,  they  provided  evidence  that  human  observ¬ 
ers  ha^e  a  sui^eciive  ira^cssion  that  a  rignificant  meat  has 
occurred  at  each  of  these  boundaries.  AU  of  these  theories 
imply  that  one  ought  to  be  able  to  locate  event  boundaries  b>* 
trading  some  surfke  duracteristic  of  the  d>n3mic  sequence 
and  by  searching  for  frames  that  correspond  to  pauses  in 
activity.  Rubin  and  Ridards  (1985)  and  Marr  and  Vmna 
(1980)  th»>reUc3!]y  defined  vvays  in  vvluch  motion  sequences 
might  be  parsed.  In  this  antde  wt  choose  a  rimple  realization 
of  these  ideas  and  test  its  efTectiveness  for  produdng  intelli' 
^le  subsampled  motion  di^la^s. 

The  Activity  Index,  a/n) 

The  activity  index  is  the  fraction  of  pixels  that  experience 
a  supraihreshold  change  in  luminance  between  frames  n  —  1 
and  n.  We  located  event  boundaries  in  ASL  sequences  by 
computing  this  measure  of  aaivity  between  each  pair  of 
consecutive  frames  m  each  sequence  and  looking  for  the  local 
minima.  Our  activity  index  was  computed  by  counting  the 
number  of  pxels  that  underwent  a  significant  chan^  of  gray 
level  between  osnsecutive  frames  in  a  sequence:  the  fraction 
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cKvn  2sd  Iwerjurr  secrets  ve  saft^Br.  la  z  ty;£a3  ASL 
scqaeses.sevea3bo(fypadsiaevcssesSEaa£9c3!y.Tbes.e/r;) 
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Tbs  arfrvi^  tadex  c^)  is  the  fracboa  of  pixels  a  fxzsx  a 
that  experifaced  a  aycadreshoid  chasige  ia  Icrr.;.*uner 
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Ibe  b^bar  de  thresboSd  pamaxter  9,  the  smaTVr  the  iaflu’ 
CDce  of  pixds  that  dia:^  as  a  resuh  of  caraeta  or  diptzzing 
Docse  father  tbaa  as  the  result  ofa  moving  oi^ect. 

As  aa  example,  we  coasader  a  30'frame  sequeace  ia  ufikh 
a  white  square  oa  a  black  badegrousd  moves  left  and  r^a 
acrctt  the  width  cf  a  frame  scusadally.  Figure  1  is  a  grajd 
of  rUe)  for  sudi  a  sequence.  Note  that  the  local  minima  in 
Figure  1,  where  oAn  ~  1)  >  oAr^  <  cAn  +  1).  corre^KKx!  to 
frames  in  (be  ori^na!  sequeace  in  which  the  direction  of 
moboa  is  changing  fix.,  the  peaks  and  troughs  of  the  snu- 
sood).Com(4ex  movies  such  as  ASL  sequences  ^xoducccom' 
plex  cAn)  functions  with  many  more  local  minima. 

Fi^re  2  shows  the  oAn)  function  for  a  7(^frame  ASL 
sequence  that  shows  the  sign  for  the  word  accident  A  drawing 


Figure  /  Activity  index  as  a  function  of  frame  number  for  a  small 
square  that  moves  tiirough  two  Q-cIes  of  a  tine  wave  across  30  frames. 
(The  local  mtmma  cone^nd  to  the  resting  points,  or  points^F^ 
change  tn  direction,  of  the  moving  ot^cct) 
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FiFffe2.  Ac^ntyiadaasafeaeSsooffoselbcaTO^WksASL 
se4ac3Xi2ss3da9«sibesta£xt&eEx9^sbweedda£d!n:2.(Tbeb::^ 
aaacs»cfaea*iyaithefrfp'!:^sgcaadcadoftf»efeag:Soococre- 
!90od  to  ibe  acdoo  of  the  s^per  s  sbe  caotts  aad  oct  oTa  ms 

pcfiooa  [ams  fel&d  to  &oc:V  Tbs  t»o  local  ssaasa  as  Fnacs  2S 
a=d  40  corrcspoad  to  e30s:ss:»«bca  she  spet's  baads  are  fcnhess 
aod  »bcs  they  scst  S3  tbs  craddle,) 


of  the  spt  is  pven  io  F^re  3a,  and  e^ety  third  frame  is 
sboKT)  in  Hprre  4.  In  onr  experiment,  the  sigrter  assumes  a 
rest  porition  io  which  her  arms  are  folded  in  front  of  her  at 
the  be^nning  and  end  of  the  seQuetKe.Tbe  same  rest  porition 
is  used  for  ah  rignsin  order  to  remo\*e  any  potential  amtnguity 
concenung  the  bepnning  and  ending  of  each  sgn  and  to 
erasure  that  on  repealed  presentations,  the  parricular  frame 
on  which  a  rign  b^ns  or  ends  does  not  proWde  a  due  to  the 
identity  of  the  sign.  In  the  sgn  that  produced  Hgure  2,  the 
signer  ruses  her  two  hands  to  either  ride,  doses  them  into 
fists,  and  mo^os  them  unul  the>'  meet  in  front  of  her  (this  is 
an  iconic  sign  for  a  collisionX  and  then  reassumes  the  rest 
poriiion  with  arms  folded. 

The  eUn)  function  in  Hgure  2  is  instrucrive  for  se\-eral 
reasons.  Hrst,  there  is  a]wa>s  a  high  activity  value  at  the 
be^nning  and  end  of  each  sequence  as  the  rigner  mov-es  out 
of  and  into  the  rest  poriiion.  In  Figure  2,  these  peaks  occur 
at  Frames  23  and  53.  Moreov*er,  at  the  bepnning  and  end  of 
the  rign,  the  activity  index  becomes  a  collection  of  closely 
spaced  local  minima.  These  frames  correspond  to  the  rest 
poriiion  in  which  luminance  noise  and  slight  movements  on 
the  pan  of  the  rigner  account  for  fructuations  in  QJin\  and 
which  might  be  mirinterpreted  as  rignificant  activities  (i  e., 
these  frames  might  be  selected). 

Aahily-index  subsampUng.  Acti\ily*index  subsampling 
means  selecting  for  presentation  only  the  frames  for  which 
o/n)  has  a  relative  minimum  as  a  function  of  n.  To  control 
the  coarseness  with  which  candidate  minima  are  sampled,  we 
introduced  a  parameter  a  that  specifies  the  minimum  increase 
in  a^n)  that  must  occur  between  consecutively  chosen  frames, 
that  is,  in  order  to  choose  both  frames/  and/,  where  i  <y, 
the  activity  index  must  rise  above  aJlS)  +  «  for  some  frame  K 
where  i  <  k  <  J.  Note  that  this  method  of  sampling  is 
asymmetric  with  respect  to  lime;  inverting  the  order  of  frames 
may  lead  to  a  difTerent  selection. 


7be  two  large  local  zmsEsa  thzx  occur  at  liasaes  2$  and 
40  a  Fapst  2  coRcspood  lo  the  poiax  X0  the  sga  when  ihc 
apsef  s  baods  am  spci^  apart  aad  to  the  fraabe  in  wfadr  they 
accL  Is  ote  woc^  if  wc  choose  fraaes  that  conespoad  to 
tbs  two  Ixd  irintnva  ax  Frames  2S  and  40,  as  as  a 
bepssag  aad  eo£ag  fixzie.  wc  sarisfy  the  criteria  foe  iatel' 
Spat  tanpocalsag^iBagwiBleredacipg  the  scqocDce  from 
70  a  4  fruaes.  These  4  fiaaes  are  iSastiaicd  in  Itpse  5a. 

Koea:k*^  cj:d  aradsr  mocions.  Because  the  acriviQr  in' 
dex  r^es  on  cbai^  in  acrivity  a  inScaie  event  boondaries, 
rouriooal  or  smooth  cfrcclar  flaoexxt  presents  a  probfem:  An 
actxvity  index  not  acbaevr  a  saprificaat  loc^  minimum 

dujfr^  soch  a  mocioo.  Mazr  and  Vassa  (19S0)  recogsxred  the 
same  sboetcoom^  ia  their  staie^mo&KHSuie  represeotaikm 
and  proposed  a  handle  these  instances  by  recognxzmg  the 
ocmnence  a  cemfoDsdir^  movement  and  dealir^  with  it 
separau^v  Our  Offn)  acrivity  index  treats  routiona)  and  dx- 
cidar  morions  the  same  as  aQ  others.  If  we  were  a  discover 
that  subsam;4ed  signs  never  readied  some  minimum  criUrrioa 
of  inteOigrb^*,  it  might  iodkate  that  rotational  and  drcular 
motion  ocair  often  enough  wiriun  ASL  a  merit  fecial 
conriderarion.  However,  an  informal  survey  of  rigns  suggests 
otherwise. 


Figure  5.  Two  iHustrated  signs  diowing  (a)  a  ample  agn  AoaDES’T 
(from  TkeJcycfSigninglp.  I02]by  L  L^ekehof  1980,SpringfieW 
MO.  G^pet  {HiUishmg  House  Copyright  19S0  by  Gospel  Ceanng 
House  Reprinted  by  permisaon)  and  (b)  a  compound  rign  mnfokM' 
YOiHTwti  (from  A  Basic  Course  in  American  Sign  Langiuige  Ip.  1 58] 
byT  Hun){:^ries,C.Padden,andT.J  0‘Rouike.  1980,St]verSpnng, 
MD:  T.  J.  Copyright  1 980  by  T.  J.  Publishers.  Repnolcd 

by  permission). 
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Figure  <  Sequence  of  digitized  images  of  the  sign  acetdent  (E^rty  third  frame  of  a  TO-frame  sequence 
is  sho^»n.  This  is  constant  subsampiing.) 


Constant  Subsampling 

To  measure  the  use  of  activity«tnde:t  subsampling,  it  must 
be  compared  svith  an  alternative  method  of  temporal 
compression.  Although  the  focus  of  their  work  Mas  spatial 
rather  than  temporal  compression,  Sperling  et  al.  (1985)  and 
Pearson  (1981)  used  a  simple  frame  repetition,  what  we  here 
call  constam  temporal  subsampiing.  By  this  method,  every 
mth  frame  is  chosen  from  the  sequence,  where  m  can  lake  on 
any  value  between  2  aios  tnc  total  number  of  frames  in  the 
onpnal  sequence.  Constant  subsampling  will  be  used  as  the 
basts  of  companson  in  tht  present  study.  Figures  4  and  5b 
illustrate  constant  subsampling  for  m  equal  to  3  and  23, 
respectively.  In  Figure  5b,  note  that  the  second  frame  catches 
the  signer  in  the  middle  of  a  movement. 


Dynamic  Display  Considerations 

Having  chosen  a  subset  of  the  frames  from  a  sequence  of 
A  frames,  how  should  we  choose  the  duration  of  each  frame 
to  ensure  that  the  displayed  sequence  retains  as  much  of  the 
rhyihmic  prop.;nies  of  the  original  as  possible**  Temporal 
constancy  is  preserved  in  constant  subsampling  by  choosing 
the  number  of  repetitions  for  each  frame  equal  to  the  constant 
sampling  factor.  For  example,  if  every  third  frame  were  chosen 
from  the  original  sequence,  each  frame  in  the  displayed 
sequence  would  be  repeated  three  times 

Because  the  frames  chosen  from  a  complex  scene  via  the 
activity  index  are  not  necessarily  separated  by  a  constant 
number  of  frames,  the  repetition  factor  for  display  must  vary 
according  to  the  location  of  the  chosen  frame  in  the  original 
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Figure  5  Four«frame  rcpresenuUons  of  the  «gn  acetdcnr.  (a>  full  gfay-sca!e  using  sctiv-ityMndex 
subsampling,  (b)  full  gray  scale  u«ng  constant  suhsamphng.  and  (c)  Innaiy  images  using  acti\ily>index 
$ubsam{4>ng. 


sequence.  Wc  repeat  each  chosen  frame  (to  replace  discarded 
frames)  until  the  next  chosen  frame  occurs.  For  example,  if 
Frames  I,  5,  15,  and  22  s\ere  chosen  from  a  SO-frame  se¬ 
quence,  Frame  I  is  repeated  4  times,  Frame  5  is  repeated  lO 
times.  Frame )  5  is  repeated  7  times,  and  Frame  22  is  repeated 
9  times,  to  reach  the  total  of  30  frames  that  appear  in  the 
ori^nal  sequence.  In  this  method,  a  different  display  sequence 
Nxould  be  produced  from  the  same  sequence  of  selected  frames 
>^hen  played  in  the  forward  rather  than  the  time-reversed 
direction. 

Static  Presentation 

Opiimal  Sumber  of  Frames 

ASL-related  inv'estigations  have,  to  this  point,  focused  ex¬ 
clusively  on  the  transmission  and  intelligibility  of  dynamic 
images.  There  are,  howtver,  several  compelling  reasons  for 
studying  the  intelligibility  of  ASLwhen  it  is  presented  in  static 


form.  Most  important,  static  images  are  used  in  most,  if  not 
all.  ASL  textbooks  and  dictionanes  (e  g .  Humphries,  Padden, 
&  O’Rourke,  1980,  Rickehof,  1980)  Important  exceptions 
are  the  books  produced  by  Stokoe  and  his  collaborators 
(Stokoe,  1974,  Stokoe,  Castcrline,  &  Croneberg,  1976),  which 
use  wTitten  symbolic  notation  to  convey  the  motion  and  hand 
shape  of  each  »gn. 

The  type  of  static  presentation  that  is  most  often  seen  in 
standard  ASL  textbooks  is  a  single-frame  image  that  corre¬ 
sponds  roughly  to  a  single  English  word  or  expression.  Typi¬ 
cally,  an  illustrated  signer  is  presented  with  overlaid  arrows 
and  '‘strobe*’  lines  to  indicate  the  desired  hand,  finger,  and 
arm  motions.  An  example  of  one  such  illustration,  from 
Riekehof  (1980),  appears  in  Figure  3a.  For  simple  signs, 
especially  those  that  use  only  one  hand,  these  illustrations  are 
quite  efficient.  Difficulties  can  arise,  however,  for  compound 
signs  that  require  a  change  in  hand  shape  or  for  the  occasional 
presentation  of  complete  sentences  In  these  instances,  such 
as  for  the  sign  depicted  in  Figure  3b  (taken  from  Humphries 
ct  al ,  1980),  the  many  strobe  lines  and  arrows  mask  the 
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inteaded  iso>^emeQt  aad  make  k  dillicult  for  soxfests  to 
r^kaiet&sgn. 

An  allasathv  to  prrMtiog  a  sin^  fnxse  for  eadi  Engitsh 
WOTd  or  (^uase  is  to  pnseot  tbe  Sames  anaagsd  adjacently, 
as  10  comic  stnp  format  (see  Hgures  .4  and  5).  (H%m  the 
itnpncticalhy  of  dts^^jiog  tbe  hundreds  of  frames  that  may 
coosthutca  angle  sesten^  it  b  necessary  to  cfaoo%  a  subset 
of  frames  fer  presentation.  How  does  one  choose  frames  in 
order  to  con>'ey  a  agn?  Obviously.  ^  b  the  static  analog  to 
tbe  djoamic  di^^y  that  has  been  addressed  earlier  and, 
coieddentaDy.  b  a  quesdon  of  great  importance  to  animators 
and  cartoonbts.  ^^lieo  dc{»cting  an  action  sequence,  anirna* 
tors  are  taught  to  represent  tbe  extremes  of  the  activity  first, 
and  then  to  HU  in  with  in-bemeen  frames  as  needed  (Levitan, 
I960).  In  the  context  of ASL,  if  frames  are  chosen  to  suoobv 
fully  convey  the  motion  u  ben  di^a^ed  d>*namxally,  do  these 
frames  convey  the  same  information  when  di9la>ed  static 
cally? 

Spatial-Temporal  Compression  Trade-Off" 

Finally,  it  is  useful  to  investigate  interactions  betu*^  spa¬ 
tial  and  temporal  compression.  A  practical  application  would 
probably  combine  temporal  and  spatial  compression  in  order 
to  avoid  the  d^^adingeHects  of  remov'ing  too  much  of  either 
spatial  or  temporal  information  Here,  we  measure  inielli^- 
bility  for  both  full  gray-scalc  ASL  sequences  (8  biis/pixel)  and 
for  the  same  sequences  made  binary  with  an  edge  detection 
scheme  Each  image  is  convolved  with  a  Caussian-smoothed 
Laplacian  and  (hen  thresholded  so  that  10%  of  the  values  are 
set  to  black,  generally  from  the  dark  side  of  image  ^ges.  The 
result  IS  a  binary,  hne-drawTi  image  (with  approximately  0.21 1 
biis/pixel).  An  example  of  such  a  binary  sequence  is  shown 
in  Figure  5c.  (These  sequences  and  the  nominal  data  rate 
were  taken  from  Sperling  ct  al ,  1985,  Expenmcni  2.  Condi¬ 
tion  H.) 

Method 

Subjects 

The  32  subjects  used  m  this  study  ^ere  recruited  in  various  places, 
including  the  New  York  Soaety  for  the  Deaf,  the  New  York  Umver- 


sqr  Office  fer  Disabled  Scudests,  woed^fhaad  amoof  tbe  deaf 
cofflaaahy.  Several  finest  beanag  ASL  tntexprelen  «ee  ahb  pyd 
Tbe  mean  age  of  oar  subjects  was  33  ye^  (ages  laagsd  from  IS  to 
52k  they  bad  been  dfsaa^  toe  aa  avenge  of  IS  yean.  T»dve 
native  signessTthose  who  were  raised  in  homes  where  ASL  was  ibt 
prisary  langcage—were  included  in  tbe  study. 

Stimuli 

Tbe  stimulss  set  consistsd  of  84  ASL  s^ns,  eadi  of  which  eorre» 
Sponds  rot^hly  to  a  single  English  word.  All  sgns  were  taken  from 
^lerU^  et  aL  (1985).  who  filmed,  digitized,  and  apj^ied  vaxious 
image  tnnsfonnaiioos  to  tbe  figns  for  use  in  their  study  of  trade-ofb 
between  ASL  s^  foteUipbilhy  aad  bandwidth.  The  signer  was  filmed 
fiom  approximately  10  ft  (approximately  3j05  m)  away,  so  that  the 
upper  body  and  filled  tbe  view-finder  ^  tbe  camera.  During 
filming,  the  signer  stood  behind  a  screen  with  a  12  x  18  in.  apenure, 
wore  dark  dothing.  and  bad  dark  hair;  these  conditions  ensured  that 
the  hand  and  face  of  the  signer  would  be  highli^ted.  Each  digitized 
frame  was  subsequently  cropped  to  96  x  64  pixels:  the  signer  was 
centered  in  each  frame  so  that  the  area  from  her  waht  to  tbe  top  of 
ber  head  was  visiUe. 

Along  with  the  ongjnal  full  gray-scale  (denoted  F(}S)  movies  of 
each  we  used  signs  that  had  been  transformed  from  the  FGS  to 
the  bne-drawn.  binary  images  previously  described  Such  signs  will 
be  itfcrred  to  as  BIN  (for  binary)  signs.  Sperling  et  al  (1985)  reponed 
an  intelligilMlry  of  .9 1 1  for  BIN  signs,  normalized  against  the  per- 
ceutage  correct  for  96  x  64  pixel  FGS  signs.  At  the  ume  of  the 
experiment,  (our  FGS  signs  were  not  available  under  the  BIN  image 
transformation,  the  BIN  conditions  used  four  signs  that  did  not 
appeal  'fi  FGS  conditions,  and  vice  versa.  The  list  of  84  signs  used  in 
the  ^udy  appears  in  Table  I,  divided  into  the  stimulus  Nocks  used 
in  the  experiment. 

ASihou^  the  term  frames  per  second  (fps)  is  used  throughout  the 
remainder  of  this  arucle  to  desenbe  the  degree  of  temporal  compres- 
ston.  all  dynamic  stimuli  were  presented  on  a  system  that  always 
displayed  60  fps  In  the  context  of  the  present  study,  fps  refers  to  the 
number  of  new  frames  per  second,  computed  by  dividing  the  number 
of  chosen  frames  by  the  duration  of  the  onginal  (or  the  denved) 
sequence.  Frame  rate  (fps)  was  varied  parametricaUy.  1  he  parameters 
m  and  a  control  the  frame  rate  for  fixed-rate  and  activny.mdex 
s^ibsamplmg.  mspectively.  Four  different  values  were  used  for  each 
of  these  two  parameters  This  manipulation  allows  us  to  collect  data 
over  a  large  range  of  intelligibility.  The  parameter  values  and  the 
avera^  number  of  frames  per  second  for  each  scheme  is  displayed  m 
Table  2. 


Table  1 

Stimulus  Blocks 


Block  1 

Block  2 

Block  3 

Block  4 

Block  5 

Block  6 

Block? 

Blocks 

telegraph 

wrong 

Sit 

emphasize 

punishment 

bear 

tobacco 

summer 

leave 

genera] 

cheese 

wife 

apple 

kill 

thanks 

think 

deaf 

giri 

until 

world 

uncle 

n.g 

home* 

talk 

finish 

short 

shoe 

our 

scrcwidiiver 

sorry 

flower 

wrestling 

plan 

week 

wait 

accident 

guilty 

tree 

member 

love 

“8iy 

noon 

picture 

hospital 

paper 

bread 

challenge 

read 

tram 

preach 

month 

friday 

understand 

behind 

pay 

start 

relax 

red 

steal 

cancel 

yesierday 

everyday* 

fun 

color 

mother 

machine 

program* 

because 

letter 

eye 

which 

before 

j'ump 

improve 

spend 

c«s* 

boss 

bored 

cop 

movie* 

grow 

pout* 

ahve* 

lousy* 

*Signs  that  appeared  only  m  BIN  (binary)  conditions  ^Signs  that  appeared  only  in  FGS  (full-gray 
scale)  conditions. 
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Stimulus  Trcn^omsiions  and  Frame^Hmes 
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Sdieme 

aorm 

Framn 

persecoad 

Activity  index 

0 

104 

Acbvity  index 

0X)2 

845 

Activity  index 

OJOS 

6.75 

Activity  index 

0.1 

5.4 

Consuat 

2 

30 

Cotutani 

4 

15 

Cofisunt 

7 

84 

ConsuM 

|i 

5.5 

Fote  The  o  b  the  panaeier  that  coctrob  the  eumber  of  samples 
tsed  by  the  activity>index  sampling  sdteme;  m  b  the  parameter  that 
go^-ems  the  number  of  frames  chosen  the  constant  subsamidiog 
scheme. 


Procedure 

The  ASL  signs  «-ere  d>\ided  into  dght  groups  of  10  signs,  each 
groupbalanctd  for  diiTiculty  by  the  criterion  of  Sperling  el  al.  (1935). 
The  expenmental  \anab!es  included  t«o  image  types  FGS  and  BIN, 
t«o  presenution  modes,  dynamic  (D)  and  static  (S).  i^»o  subsamplmg 
schemes,  constant  and  actisity  index,  four  frame  rates,  and  10  stim¬ 
ulus  blocks.  A  full-factorial  experiment  on  these  factors  uould  require 
320  subjects  «ith  only  1  subject  in  each  ceil  To  achiexe  a  more 
manageable  study,  v^-e  ran  four  separate  groups  of  subjects,  one  for 
each  combination  of  image  transformation  and  presentation  mode 
(FCS-D,  FGS-S,  BIN-D.  and  BIN-S), 

To  make  the  most  elTident  use  of  each  subject,  the  remaining 
factors  uithtn  each  of  the  four  groups  uvre  subjected  to  a  Oreco- 
Latin  design  in  v>hich  subsamplmg  scheme  and  compression  factor 
lAtre  fully  randomized  and  order  of  presentation  was  panialty  ran< 
domized  In  other  words,  each  stimulus  block  of  10  signs  was  paired 
one  time  with  eser>  combination  of  subsamplmg  scheme  and 
compression  factor  over  the  course  of  the  experiment  For  conven¬ 
ience.  the  combination  of  subsamplmg  scheme  and  compression 
factor  IS  referred  to  as  the  stimulus  transformauon,  every  transfor¬ 
mation  appeared  m  each  erdm^  position  of  stimulus  presentation 
Order  of  presentation  is  only  partially  randomized,  because  sequence 
efTects  are  no;  balanced  in  this  design.  Each  subj’ect  saw  eight  complete 
stimulus  blocks,  each  block  having  undergone  a  different  transfor¬ 
mation  (le..  repeated  measures  over  iransformation  and  stimulus 
block)  A  total  of  32  subjects  were  required  for  a  single  replication 
through  each  of  the  four  8x8  Greco-Latin  squares. 

Imclligibiliiy  tfst  All  stimuli  were  processed  with  the  HIPS  im¬ 
age-processing  software  (Landy.  Cohen.  &  Sperling.  1984a,  1984b) 
and  were  presented  on  a  computer-controlled  graphics  di^ilay  proc¬ 
essor  (Adage  RDS-3000  image-processing  system).  Images  were 
viewed  on  a  Conrac  721109  monitor,  set  so  that  the  mean  lumi¬ 
nance  of  the  display  vvas  equal  to  55  candela  per  square  meter  (cd/ 
m’)  Subjects  were  seated  approximately  1  m  from  the  screen,  (hough 
they  were  free  to  move  to  ihcir  most  comfortable  distance  (Parish  & 
Sperling.  1987,  demonstrated  that  for  stimuli  whose  visibility  is 
impaired  by  noise,  viewing  distance,  over  an  extremely  vvide  range, 
is  immaterial ) 

For  all  conditions,  subjects  were  required  to  re^nd  to  each  ASL 
presentation  with  an  English  gloss  for  the  presented  sign  Subjects 
were  told  that  each  sequence  contained  0.11)  a  single  sign  and  that 
each  sign  corresponded,  roughly,  to  one  English  word  In  most  cases, 
subjects  wrote  their  responses  on  an  answer  sheet  In  cases  in  which 
deaf  signers  did  not  possess  English  skills  that  were  advanced  enough 
to  allow  them  to  respond  with  a  wntten  word,  they  would  sign  the 
response  to  an  ASL  inierprcicf  who  then  recorded  the  English  equiv- 


aleaL  Tbc  istexprsta  coofinaed  these  subjects*  uadesszaisdisg  of  the 
stn  by  faav^  theta  ehber  use.tbe  word  ia  a  seoteacc  or  further 
daborate  oo  ibe  taeaniag  ot the  word.  FioaSfy,  aD  subTsm  »txt  told 
that  if  they  oo  ida  U>e  correct  answer  was.  they  did  m. 
have  to  respond. 

DpiMiieprfsmtaion.  Tiewti  begin  appeared  on  the  monitor. 
signaKng  the  subject  to  press  any  button  on  a  6v*e-bunoa  keypad. 
After  the  button  press,  the  screen  was  deared.  and  a  whhe  cue  spot 
appeared  for  OJ  s.  This  was  fdlowtd  by  a  05-s  blank  interval  ^ 
the  presentaiioD  ^  an  ASL  movie  (ftame  sequence).  The  sequence 
was  shown  once,  with  the  frames  repeated  as  necessary  to  retain  the 
duration  of  the  ongjnal  image  sequence,  and  was  foBowed  by  a  Uaak 
screen.  The  won!  h-oii  was  displayed  until  the  next  seqttence  was 
ready  for  di^ilay  (2  or  3  s).  at  which  point  the  word  ron/mur  appeared. 
While  waitit^  for  eoniinue  to  appear,  we  recorded  the  sut^ect's 
fcspome.  After  the  subject’s  re^nse  and  after  the  word  continue 
appeared  on  the  screen,  the  subject  was  ftce  to  press  any  button  to 
initiate  the  next  trial. 

Static presenuiiton.  As  with  the  dynamic  presentation,  the  initial 
button  press  erased  the  word  begin  from  the  screen  and  caused  the 
stimulus  to  be  presented,  though  without  a  cue  spot  The  frames  of 
each  movie  were  amnged  in  order  by  rows  and  columns,  from  left 
to  right  and  from  top  to  bottom  Up  to  seven  frames  appeared  in 
each  row.  A  samj^c  ‘page*  of  24  frames  for  the  sign  accident  is  shown 
in  Figure  4.  Shorter  pages  arc  shown  in  Rgure  5  for  several  conditions 
On  presentation,  tl^  subjea  scanned  the  page  and  dedded  on  a 
response  Before  wming  or  signing  the  response,  however,  a  second 
button  press  was  requited  to  erase  the  screen  After  responding,  the 
word  «w/mue  appeared  on  the  screen  The  next  button  press  initiated 
the  next  tnal 

Results 

Scoring 

The  measure  of  performance  for  all  subjects  and  conditions 
IS  m  percentage  comxl  For  some  of  the  signs  used  in  the 
study,  several  English  responses  arc  considered  correct,  a 
result  of  the  histoncal  and  regional  development  of  ASL  Each 
subject's  answer  sheet  was  scored  by  a  congenitally  deaf  signer 
w  ho  is  fluent  in  ASL. 

Subject  Comparison 

To  assess  the  general  ability  of  the  subjects  m  this  study, 
we  may  compare  ihcir  performance  for  dynamic  sequences 
with  the  performance  of  the  subjects  from  Sperling  ct  al 
(1985),  who  viewed  similar  sequences  For  the  most  nchly 
subsampled  full  gray-scale  sequences,  which  averaged  about 
20  frames,  subjects  in  the  present  study  averaged  86%  correct, 
nearly  identical  to  the  Sperling  ei  a!  subjects,  who  averaged 
about  87%  CO.  .ct.  For  binary  images,  subjects  m  the  present 
study  averaged  70%,  m  companson  with  Sperling  cl  al.’s  80% 
correct  Our  subjects  may  have  been  somewhat  less  skilled 
than  those  of  Sperling  ct  al.,  perhaps  a  result  of  the  mix  of 
native  and  nonnativc,  hearing  and  nonhearing  signers  used 
in  the  present  study,  as  opposed  to  the  more  homogeneous 
group  of  deaf  signers  used  by  Sperling  et  al 

A/am  Effects 

The  data  in  each  of  the  four  Creco-Latin  squares,  distin¬ 
guished  by  the  combination  of  image  type  and  presentation 
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FRAMES/SECOND 


Figure  6.  Mean  subject  pcfformance  as  a  function  of  the  mean  number  of  frames  per  second  for  each 
of  the  four  main  conditions  of  the  experiment  and  for  each  subsampling  scheme,  (a)  full  gray>$cale 
images  in  dynamic  presentation  (POS'D),  (b)  full  gray*scale  images  in  static  presentation  (PGS*S).  (c) 
binary  images  in  dynamic  presentation  (BIN'D).  and  (d)  binary  images  in  sutic  presentation  (BIN*S) 
(The  open  dots  on  each  graph  represent  performance  with  acuvity^index  sequences  and  the  solid  dots 
represent  constant  subsampled  sequences.  The  Ncrtical  bars  represent  the  standard  error  of  the  mean ) 


mode,  are  displayed  in  Figure  6.  Probability  correct  is  dis¬ 
played  as  a  function  of  mean  number  of  frames,  averaged 
across  the  8  subjects  within  each  design.  These  data  were 
subjected  to  an  arcsine  transformation  in  order  to  decorreiatc 
mean  and  >anance.  the  arcsine  data  were  used  in  the  subse¬ 
quent  analyses  Mam  elTecis  for  each  individual  Greco-Latin 
square  were  evaluated  by  an  analysis  of  variance. 

The  four  main  eflccis  for  each  Greco-Utin  square  arc 
subjects,  order,  stimulus  block,  and  image  transformation 
The  most  stringent  assumption  made  by  the  analysis  is  that 
the  interactions  among  the  four  main  effects  of  the  Greco- 
Latin  square  arc  negligible  (Winer,  1971)  Every  effort  was 
made  to  ensure  negligible  interactions.  Subjects  were  assigned 
randomly  to  each  cell,  and  stimulus  blocks  were  balanced  for 
difficulty.  There  is  no  a  pnon  reason  to  assume  that  there 
would  be  Significant  interactions. 

Stimulus  transformation,  which  includes  both  subsampling 
scheme  and  compression  factor,  was  a  significant  factor  for 
the  FGS-D,  BIN-D  (p  <  .01 ),  and  FGS-S  (p  <  .05)  conditions 
but  not  for  the  BIN-S  condition,  although  there  was  a  trend 
m  the  expected  direction.  Stimulus  block  was  significant  for 
all  four  designs  (p  <  .05).  The  fact  that  the  stimulus  blocks 
differed  from  each  other  indicates  that  our  efforts  to  equate 
the  blocks  for  diillculty  was  not  entirely  successful.  This  is 
not  surpnsing,  because  Sperling  et  al.  (1985)  also  found  a 


significant  effect  of  stimulus  block  despite  similar  efforts  We 
rely  on  the  fact  that  throughout  the  course  of  the  expenment. 
ail  stimulus  blocks  were  presented  in  all  conditions,  thereby 
allowing  block  effects  to  balance  out  Finally,  there  were 
significant  subject  differences  ( p  <  0 1)  for  static  presentation 
ofboth  image  types  (FGS  and  BIN). 


Subsampling  Scheme 

The  data  from  the  full  gray-scale  conditions,  seen  m  Figures 
6a  and  6b,  suggest  that  the  activity-index  sequences  were 
more  intelligible  than  constant  subsamplcd  sequences  Ideally, 
we  would  have  had  data  from  both  schemes  at  the  same  frame 
rate  to  allow  us  to  directly  test  this  hypothesis  Unfortunately, 
the  nature  of  the  subsampling  schemes  prevents  such  sampling 
precision.  To  conduct  the  test,  we  used  linear  interpolation  to 
estimate  performance  for  6.75  fps  for  constant  subsampling 
Activity-index  data  had  already  been  collected  at  this  h-ame 
rate.  For  each  presentation  format,  a  i  test  was  computed  with 
the  interpolated  constant-subsamplmg  data  and  real  activity- 
index  data.  Both  tests  strongly  reject  the  null  hypothesis  ( p  < 
01).  For  dynamic  presentation  of  binary  images  (Figure  6c), 
data  interpolated  at  8  85  fps  for  constant  subsampling  also 
reject  the  null  hy  pothesis  (p  <  .05). 
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For  d>’namic  preseotation,  aciivity-index  performance  is 
esu  mated,  by  av^^ng  between  points,  to  be  about  8%  better 
for  constant  subsampling  (for  the  portions  of  the  cur>'es 
that  overiap).  This  estimate  reflects,  in  pan,  the  CTOSSover 
interaction  that  occurs  at  the  lowest  frame  rate,  in  which 
constant  subsampling  outperforms  acti>‘ity>index  perform* 
ance.  This  crossover  interaction  almost  certainly  comes  from 
the  fact  that  the  activity*index  scheme  chose  frames  nearer  to 
the  be^nning  and  end  of  the  ori^nal  sequence  when  working 
at  extremely  coarse  sampling  rates,  whereas  constant  subsam¬ 
pling  chose  frames  uniformly  throughout  the  sequences.  This 
tendency  of  activity.index  subsampling  was  due  to  the  large 
movements  that  occurred  as  the  signer  moved  in  and  out  of 
the  rest  position,  producing  the  only  \*alues  that  rose 
above  a.  If  this  anifactual  performance  is  discounted— that 
is,  if  the  beginning  and  ending  rest  positions  arc  removed 
from  consideration— the  estimate  of  overall  activity-index 
superiority  to  constant  subsampling  increases  to  15%. 

The  form  of  activity-index  performance  varies  with  stimu¬ 
lus  presentation.  For  static  presentation  of  full  gray-scale 
images  (figure  6b),  activity-index  performance  rises  above 
that  of  constant  subsampled  images  as  the  total  number  of 
frames  decreases,  the  estimated  diflcrencc  in  performance 
nses  by  nearly  20%  when  6.75  fps  arc  displayed.  Aaivny- 
index  performance,  however,  falls  off  sharply  with  fewer  and 
greater  numbers  of  frames  per  second.  Interestingly,  for  activ- 
ity-index  sequences,  FGS-S  has  a  performance  maximum  at 
about  7  fps,  for  constant  subsampled  sequences,  performance 
with  FGS-S  improves  monotomcally  with  frames  per  second. 
In  contrast,  static  presentation  of  binary  images  (Figure  6d) 
produces  flat  and  nearly  equal  performance  for  both  subsam¬ 
pling  schemes.  reflected  in  the  nonsignidcant  transformation 
factor  in  the  analysis 

Discussion 

Struciure  of  Events 

Although  the  expenment  desenbed  here  is  not  a  direct  test 
of  the  validity  of  Ncwison’s  (1973)  definition  of  breakpoints, 
there  is  certainly  4  close  relation  between  their  action-unit 
boundaries  and  our  basis  for  dynamic  sequence  segmentation. 
Insofar  as  such  a  companson  may  be  made,  the  results  re¬ 
ported  here  generally  confirm  findings  of  Newtson  and  his 
collaborators  with  regard  to  the  perceptual  salience  of  bound¬ 
aries  and  their  ability  to  convey  cntical  event  information. 
Indeed,  in  one  condition,  ASL  sequences  that  were  con¬ 
structed  via  the  activity  index  were  as  intelligible  as  the 
original  sequena*  from  which  the  frames  were  taken,  despite 
a  four-fold  reduction  in  the  number  of  frames.  This  result  is 
similar  to  an  intelligibility-rating  result  of  Newtson  and 
Engquist  (1976)  and  yet,  because  of  the  objective  nature  of 
ASL  intelligibility,  is  not  open  to  the  questions  that  follow 
the  subjective  latingparadigm  used  by  Newtson  and  Engquist 

Direct  Perception  of  Events 

A  long-standing  argument  in  theories  of  event  perception 
revolves  around  the  issue  of  whether  events  arc  directly  per- 


ceiv-ed,  originating  io  Aseb's  (1952)  theory  that  action-defin¬ 
ing  gestaltcn  appear  in  the  behavior  sequence,  or  whether 
event  per<^UoD  is  more  of  an  interpretive,  cognitive  process. 
If  action  is  directly  perceived,  it  must  be  the  case  that  the  cues 
that  give  rise  to  the  percept  exist  in  the  surface  structure  of 
the  behavior  sequence;  that  is,  a  necessary  condition  for  the 
direct  perception  of  events  is  that  the  baris  for  event  structure 
must  appear  in  the  stimulus  itself.  This  is  the  explicit  assump¬ 
tion  behind  the  behavioral  s^mentation  method  of  Newtson 
(1973),  and  it  is  well  supported  by  the  many  subsequent 
experim^ts  by  Newtson  and  his  collaborators.  If  complete 
event  information  did  not  exist  in  the  surface  structure  of  the 
sequences  used  in  the  present  experiment,  it  would  have  been 
extremely  difficult,  if  not  impossible,  to  segment  our  ASL 
movies  into  intelligible  sequences.  At  the  very  least,  some 
higher  level,  interpretive  driver  would  have  been  necessary  in 
order  to  produce  compressed  images  that  were  more  intelli¬ 
gible  than  those  produced  by  constant  subsampling.  However, 
it  is  clear  from  the  results  of  our  expenment  that  the  necessary 
event  information  does  reside  in  the  surface  struciure  of 
sequences. 

Even  if  It  is  conceded  that  events  are  directly  perceived  and 
that  critical  event  information  is  earned  by  event  boundaries, 
wc  would  expea  static  prcseniaiion  of  behavior  sequences  to 
require  a  more  interpretive  process  than  does  dynamic  pres¬ 
entation  of  the  same  sequences  That  is,  events  are  usually 
not  directly  perceived  with  static  presentation  (Newtson  & 
Engquist,  1976).  Indeed,  for  dynamic  presentation,  the  change 
in  surface  structure  from  one  moment  to  the  next  is  imme¬ 
diate,  whereas  for  static  presentations,  the  change  must  be 
inferred  from  an  analysis  of  frames.  That  these  are  fundamen¬ 
tally  different  processes  is  reflected  in  the  demonstration  of 
Icfi-hcmisphcric  advantage  for  statically  presented  signs  and 
the  absence  of  lateral  asymmetry  for  dynamic  signs  (Poizner, 
Batuson,  &  Harlan,  1979).  Because  the  inference  process 
would  certainly  introduce  an  additional  source  of  error,  it 
seems  likely  that  sutic  images  would  be  less  efficient  at 
transmitting  the  desired  information.  Accordingly,  wc  note 
the  generally  lower  intelligibility  scores  for  static  images  in 
the  present  experiment. 

The  current  expenment  supports  the  notion  that  the  basis 
for  event  structure  appears  in  the  stimulus  itself  We  found 
that  the  ASL  events  isolated  by  a  simple  image-based  com¬ 
putation  seem  to  agree  with  subjective  impressions  of  event 
structure.  This  does  not  preclude  the  possibility  that  other 
sources  of  information,  including  higher  reasoning,  can  act  to 
modify  the  inieipretaiion  of  an  event.  Nonetheless,  our  find¬ 
ing  bode  well  for  efforts  in  artificial  intelligence  directed 
toward  machine  interpre',ation  of  actions 

ASL  Primitives 

A  central  component  in  the  traditional  study  of  ASL  is  the 
use  of  ASL  primitives.  Stokoe  (1974)  developed  a  set  of 
primitives  to  desenbe  signs  that  are  composed  of  a  limited  set 
of  movements,  hand  shapes,  and  locations  of  articulation 
These  components  are  meaningless  when  taken  individually; 
when  combined  according  to  rule-governed  constraints,  they 
form  the  lexical  basis  of  ASL  This  is  entirely  analogous  to 
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the  function  of  phonemes  in  spoken  language.  A  fourth  ASL 
dimension,  hand  orientation,  has  subsequently  been  added  to 
the  list  of  primitives  (Battison.  1974). 

A  someuhat  remarkable  result  is  the  high  intelligibility  of 
the  ASL  sequences  at  extremely  coarse  sampling  rates.  Even 
in  the  most  degraded  condition,  subjects  correctly^nterpreted 
nearly  a  third  of  the  signi  An  explanation  of  these  (indings 
may  stem  from*  the  relative  Jmportance  of  the  four  ASL 
primitives.  Con^der  that  a  single  frame  taken  from  the  middle 
of  a  sequence  will  likely  convey  information  about  three  of 
the  four  pnmitivewhand  onentation,  hand  shape,  and  lo¬ 
cation  of  articulation.  Only  motion  is  lost,  or  at  the  very  least, 
severely  degraded.  The  fact  that  subjects  do  so  well  with  this 
limited  amount  of  information  reflects  the  degree  to  which 
nonmotion  factors  play  a  critical  role  in  ASL  intelligibility. 
Indeed,  several  studies  have  shown  that  the.four  primitives 
arc  not  equally  perceptible,  nor  arc  they  equally  important 
fortntelltgibleASL(Klima&Bel!ugi,  1979,  Tamer  &  Fischer, 
1982). 

Image  Sequence  Compression 

Acthity  index:  Dynamic  sequences  It  is  apparent  from 
the  data  and  from  subjective  reports  that  for  low  frame-rate 
conditions,  ASL  sequences  that  have  been  subsampled  with 
the  activity  index  are  more  intelligible  than  those  subsampted 
by  a  constant  factor.  In  the  full  gray*scale  dynamic  condition 
(Figure  6a),  activity-index  sequences  were  correctly  identified 
Z0%  of  the  time  at  slightly  less  than  1 1  new  fps.  When 
constant  subsampling  was  used,  the  same  performance  level 
was  not  achieved  until  an  estimated  20  to  25  new  fps  were 
displayed.  At  this  entenon  of  performance,  the  number  of  to- 
be-transmitted  frames  was  reduced  by  a  factor  of  1.8  to  2.25, 
roughly  a  twofold  improvement  over  constant  subsampling. 

What  arc  the  implications  of  our  findings?  An  8-bit  se¬ 
quence  With  frames  of 96  x  64  pixels  shown  at  30  fps  requires 
1.47  Mbus/s  for  full  bandwidth  transmission,  more  than  300 
times  the  nominal  capacity  of  the  public  switched  telephone 
network.  The  large  bulk  of  compression  needed  to  transmit 
ASL  sequenws  can  certainly  come  from  spatial  compression 
and  cfliacni  data-cncodmg  schemes,  as  demonstrated  by 
Sperling  et  al.  (1985).  Nonetheless,  sharing  the  cflects  of 
compression  among  spatial  and  temporal  domains  reduces 
the  reliance  on  spatial  compression,  thereby  reducing  the 
amount  of  spatial  information  loss.  Furthermore,  it  is  easy  to 
conceive  of  environments  in  which  it  is  desirable  for  dynamic 
information  to  be  transmitted,  or  encoded,  as  efficiently  as 
possible.  Intelligent  temporal  subsampling  would  have  to  be 
included  in  any  such  scheme. 

The  degree  to  which  spatial  and  temporal  compression  may 
be  joined  depends  on  the  degree  of  interaction  between  the 
two  domains.  Although  activity-index  subsampling  yields  se¬ 
quences  that  are  more  intelligible  than  constant  sub»mpling 
for  the  binary  images  used  in  the  present  study,  the  overall 
level  of  iniclli^bihty,  m  relation  to  full  gray-scale  sequences, 
was  reduced  (although  this  particular  comparison  is  across 
different  groups  of  subjects).  This  interaction  may  suggest  that 
there  is  limited  promise  in  combining  the  two  forms  of 
compression.  Indeed,  Sperling ci  al.  <1985)  found  that  extreme 


spatial  compression  yielded  frames  that  were  temporally  de- 
corrdated,  so  that  additional  temporal  compre^ion  was  in- 
efleclive. 

It  may  be,  however,  that  the  particular  form  of  spatial 
compres^on  used  injhe  present  study  undermined  the  success 
of  activity  subsampling.,  pur  binary  images  were  constructed 
by  painting  10%  of  the  pixels  on  the  dark  side  of  edges  black 
on  a  white  back^ound,  and  the  selection  of  pixels  to  darken 
might  \yay  with  slight  changes  in  the  signer’s  position.  The 
phyucai  repr^ntation  of  the  signer  within  the  sequence  (i.e., 
the  contouis)^emerged  as  a  result  of  the  juxtaposition  of  the 
black  pixels  averaged  over  several  frames.  Accordingly,  a 
dngje  frame  taken  from  the'-middle  of  the  s^ucncc  may 
represent  the  form  of  a  human  only  very  poorly;  motion  is 
necessary  for  the  tiue  physical  structure  of  the  signer  to 
emerge.  By  disrupting  the  temporal  characteristics  of  the 
sequences,  we  induced  a  breakdown  of  the  spatial  structure 
of  the  signer  herself.  Naturally,  with  the  loss  of  spatial  struc¬ 
ture,  intelligibility  suffered.  A  better  test  of  the  temporal  and 
spatial  interaction  would  be  to  use  a  spatial  compression 
scheme  that  preserves  the  structure  within  a  frame  without 
relying  on  motion  cues  and  spatial  averaging  that  occur 
between  frames. 

Rotational  and  circular  motions  It  was  noted  in  the  intro¬ 
duction  that  signs  with  rotational  cr  circular  motions  present 
a  unique  problem  to  the  sort  of  temporal  segmentation  con¬ 
ducted  by  the  activity  index  There  are  changes  m  the  direc¬ 
tion  of  the  moving  component  (or  components)  without  a 
corresponding  change  in  velocity  or  acceleration  Depending 
on  the  enteria  used  to  define  a  rotational  or  circular  motion, 
between  6%  and  15%  of  the  signs  used  in  this  expenment 
could  be  so  classified.  By  using  the  stneter  enteria  fbr  inclu¬ 
sion.  activity-index  performance  was  compared  wuh  constant 
subsampling  performance  within  a  group  of  five  signs  with 
rotational  or  arcular  motions.  There  was  no  statistical  differ¬ 
ence  between  the  two  subsampling  schemes  within  this  group 
of  signs.  As  expected,  activity-index  subsampling  presented 
no  advantage  over  constant  subsampling  for  this  group  In 
addition,  intelligibiltty  for  the  group  of  five  rotational/circuiar 
signs  was  compared  wuh  intelligibility  for  the  entire  stimulus 
set,  AUhou^  the  difference  was  not  significant,  almost  cer¬ 
tainly  a  result  of  the  small  number  of  samples,  intelligibility 
for  the  rotational/circular  signs  was,  as  a  group,  slightly  lower 
than  that  of  the  complete  set.  Again,  this  is  consistent  with 
our  expectations. 

Despite  the  shortcomings  of  activity-index  subsampling 
that  appear  when  confronted  with  circular  or  rotational  signs, 
the  cfTeclivenctt  of  this  technique  is  not  likely  to  be  greatly 
affected  in  any  environment  that  more  closely  resembles  the 
real  world  In  a  continuous  stream  of  signing,  contextual 
constraints  of  the  conversation  will  increase  the  overall  intel¬ 
ligibility  of  the  individual  signs.  Although  there  is  a  ceiling 
effect  on  many  easily  interpreted  signs,  these  other,  more 
difficult  signs  will  be  made  more  intelligible.  Furthermore,  we 
note  that  these  signs  usually  represent  a  fairly  small  percentage 
of  the  total  number  of  available  signs 

Application,  reai-lime  compulation  Sperling  ct  a!  (1985) 
demonstrated  that  telephone  transmission  of  intelligible  AST" 
was  feasible,  the  expenments  presented  hc.-e  indicate  how  this 
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minimal  transmission  can  be  improved  by  activity  subsam* 
pling.  In  order  for  such  a  system  to  be  useful  to  the  signing 
public,  the  necessary  hardware  must  be  relatively  affordable, 
esisy  to  use,  widely  avmlable,  and  the  processing  ‘must  be 
carried  out  in  real  time.  In  the  present  study,  all  computations 
were  performed  in  software  and  required  condderable  com* 
puting  power  and  time.  Howler,- the  computations  (the 
accumulation  of  frame*by*frame  differences)  were  deliber* 
ately  chosen  to  be  of  the  kind  that  are  ea^y  embodied  in 
par^Iel  microprocessors.  Indeed,  we  do  not  see  any  purely 
technical  obstacles  to  producing  video  telecommunications 
devices  that  can  transmit  intelli^'ble  ASL  over  the  ordinary 
switched  communications  network.  Such  facilities  would  have 
enormous  practical  significance  for  the  signing  deaf  and  hear* 
ing  impaired,  redudng  their  isolation  from  each  other  and, 
one  hopes,  from  the  hearing  community  at  large. 


Static  Presentation 

Optimal  number  of  frames  Two  interesting  findings 
emerge  from  the  static  presentation  conditions.  First,  it  is 
encoura^’ng  to  note  that  even  in  the  most  difUcuIt  condition, 
there  was  still  a  30%  chance  of  correctly  identifying  the 
presented  sign  This  attests  to  the  robustness  of  the  ASL  signal. 
Second,  as  noted  in  the  Results  section,  performance  for  static 
presentation  of  full  gray.scale  signs  declines  when  there  are 
more  than  6  fps  in  the  activity  sampling  condition.  Why? 

Subjects  reported  difficulty  with  the  task  of  scanning 
through  a  page  of  “printed*'  ASL  frames,  although  they  im» 
proved  with  practice  The  most  common  complaint  was  that 
there  were  “too  many  frames  to  see  what  was  going  on."  If  it 
were  simply  the  case  that  there  was  an  optimum  number  of 
frames  for  each  sign,  then  we  would  have  expected  to  see 
evidence  of  this  in  both  subsampling  conditions.  Yet,  this 
pattern  emerged  only  for  activity-index  subsampling.  The 
difference  IS  that  although  activity-index  subsamplmg  chooses 
critical  frames,  when  the  frame  repetition  factor  m  is  increased 
in  constant  subsampling,  critical  frames  are  just  as  likely  to 
be  discarded  as  any  other  frames.  For  constant  frame^raie 
sampling,  the  improved  performance  that  would  have  resulted 
at  the  optimal  frame  rate  is  compensated  by  the  loss  of  cniicai 
information  It  is  not  just  that  there  is  an  optimal  number  of 
frames,  but  that  there  are  optimal  frames,  and  that  activity 
index  subsampling  is  one  method  of  discovering  optimal 
subsets  of  frames. 

Automatically  generated  ASL  text.  The  ability  of  subjects 
to  “read"  static  signs  and  our  ability  to  use  digital  image 
technology  to  produce  static  text  raise  an  intriguing  possibil¬ 
ity  messages  or  even  books  composed  entirely  of  signed 
sequences.  The  automatic  production  of  such  static  text  offers 
ASL  signers  an  opportunity  for  veridical  representation  of 
ASL  conversations  that  is  understandable  directly  without 
mechanical  aids,  such  as  VCRs.  Direct  quotes,  jokes,  an¬ 
nouncements,  and  the  like  can  be  communicated  with  indi¬ 
vidual  expression  and  intonation.  It  remains  to  be  determined 
whether  signers  could,  with  practice,  become  sufiicienil)  pro- 
Hcient  at  reading  signed  text  to  make  these  possibilities  prac¬ 
ticalities. 
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Sperling,  Land)*;  Dosher,  and  Perkins  (1989)  proposed  an  otjectI>«  3D  shape  identincation  task 
with  2D  arilfactual  cues  remo\-ed  and  with  full  feedback  (FB)  to  the  subjects  to  measure  KDE* 
and  To  dreumvent  algorithmically  equivalent  KDE^temati%«  computations  and  ^ifact  ual  Don* 
KDE  processing.  (I)  The  2D  velocity  flow-Held  was  necessary  and  suflicient  for  true  )^E.  (2) 
Only  the  f>rst«order  (Fourier-based)  perceptual  motion  ^em  could  solve  our  task  bd:ause  the 
second-order  (rectib'ins)  system  could  not  amultaoeously  process  more  than  two  locations.  (3) 
To  ensure /trst-ordcr  motion  processing.  KDE  tasks  must  require  simultaneous  processing  at 
more  than  two  locations.  (4)  Pracbce  with  FB  Is  essentia)  to  measure  ultimate  capacity  (aptitude) 
and,  thereby,  to  enable  comparisons  with  ideal'observen  Expenments  without  FB  measure 
ecological  achievement— the  ability  of  subjecu  to  extrapolate  their  past  experience  to  the  current 
stimuli. 


Our  article  (Sperling.  Landy,  Dosher,  &  Perkins,  1989, 
henceforth,  the  source  aruc/c)  proposed  the  following.  (I )  An 
objective  task  that  involves  53  dificreni  shapes  to  measure 
shape  identification  performance  in  kinetic  depth  efleci 
(KDE)  expenments;  (2)  an  algonihm  for  the  structure-from* 
motion  computations  that  subjects  perform  on  these  and 
similar  stimuli’,  and  (3)  a  distinction  between  three  kinds  of 
computations.  We  distinguished  (a)  the  true  KDE  computa¬ 
tion.  (b)  a  KDE-altemative  computation— an  informational 
equivalent  to  the  true  KDE  computation  but  carried  out 
elsewhere  in  the  brain,  and  (c)  an  artifactual  computation  that 
arrives  at  the  correct  response  in  a  given  task  but  is  based  on 
an  incidental  property  of  the  display.  We  motivated  our 
discussion  by  pointing  out  difficulties  in  previous  work  on 
KDE  that  we  believed  could  be  remedied  by  mcasunng  ob¬ 
jective  performance  in  tasks  like  ours. 

Braunstem  and  Todd,  two  cxpenmcnicrs  who  fell  unjustly 
criticized,  wrote  a  commentary  (Braunstem  &  Todd,  1990)  in 
which  they  argued  that  (a)  we  dismiss  legitimate  2D  relative 
velocity  cues  to  KDE  as  artifactual,  (b)  our  expcnmenial  task 
was  not  exempt  from  the  cnticisms  we  levied  at  others,  and 
(c)  KDE  should  be  measured  in  tasks  in  which  subjects  are 
not  given  feedback  about  the  correctness  of  their  responses 
(so  that  the  subjects  do  not  learn  to  use  artifactual  cues). 

Braunstein  and  Todd’s  ( 1990,  henceforth,  ihe  cruia)  point 
(a)  reflects  an  unfortunate  misreading.  In  fact,  wc  proposed 
that  “the  struclurc-from-motion  algonthm. .  .involves  finding 
local  2D  velocity  minima  and  maxima  and  assigning  depth 
values  to  these  locations  in  consistent  proportion  to  their 
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velocities’*  (Sperling  et  al.,  1989,  p.  839;  see  also  Figure  6.  p 
838).  In  this  article,  we  aiicmpi  to  further  clarify  the  role  of 
relative  motion  cues  in  KDE  tasks  (using  2D  dynamic  images 
to  answer  questions  about  3D  shape)  and  in  the  continuum 
of  mental  compulations  between  true  KDE  and  truly  anifac* 
tual  computations.  KDE  is  the  perceptual  experience  of  3D 
object  depth  evoked  by  dynamic  2D  images.  The  opposite 
end  of  the  continuum  is  an  artifactual  computation  that 
arrives  at  the  correct  response  for  the  experimenia)  task  by 
using  an  incidental  property  of  the  display  Although  anifac* 
tual  compulations  need  not  involve  motion  cues,  in  the  source 
article  wt  gave  examples  of  some  that  do.  Thcse  computations 
were  called  artifactual  because  motion  entered  the  computa¬ 
tion  in  a  way  that  shortcut  the  KDE  computation.  In  one 
example,  a  measurement  of  absolute  velocity  at  a  single  fixed 
point  in  the  display  would  have  sufficed  to  yield  the  correct 
response.  Our  aim  was  not  so  much  to  classify  computations 
but  to  actually  deduce  the  minimal  computation  that  would 
suffice  to  solve  particular  KDE  tasks.  In  a  well-constructed 
task,  there  is  no  computational  shortcut— the  minimal  com¬ 
putation  is  the  KDE  computation  or  is  essentially  equivalent 
tort. 

The  critics*  arguments  (b)  and  (c)  were  anticipated  and 
considered  in  the  source  article.  Here  we  elaborate  the  source 
article’s  discussion  and  respond  to  two  newly  raised  funda¬ 
mental  issues  (How  should  experiments  be  conducted?  and 
How  can  a  subject's  mental  computations  be  exposed,  meas¬ 
ured,  and  controlled?)  and  to  other  issues  (immediacy,  prac¬ 
tice,  and  scintillation)  that  pertain  to  our  specific  task 

The  53.Shape  KDE  Task 

Motion  Produces  an  Immediate  Experience  of  KDE 

Our  task  uses  53  diflcrcnt  shapes  whose  surfaces  consist  of 
random-dot  textums.  Each  shape  is  defined  by  three  equally 
spaced  locations  Each  location  contains  either  a  hill  (+1)  or 
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a  \*2B£y  (*- !X  or  is  Ihz  (0).  SdssS  »  cccsmttssd  CQ  cos  cf 
two  sudi  sets  of  locasocs.  A  sssooelasf  C3ss£xs»* 

uon  tat)  h2B  tsto  a  fk%s.  asd  tacrps  a  t&rsc^ 
coafiguraaoomSooscycead-ogloocad.Tbcsa&gasdco* 
dot  suite  k2s  CO  shz^  cces  aad  locSs  eSsdv  fist  Wba 
the  soi&cs  bepns  to  C30i«  ia  a  fEsde  rocsy  oscS^oo.  it 
appeals  icstamly  to  have  a  parted  t&apc.  la  ibs  socrss 
anidcuruTOge  "An  sshyects  reported  that  tbypgtsltsd  a 
3D  surface  the  first  aad  e%‘eiy  fitae  they  \isaed  the 
Dumerosity  dtspb)^*  S30).  We  have  dmoRSSaad  these 
KDE  shapes  to  sut^ects,  to  sistoes,  aad  io  biad.t&  of 
observers  at  nuaaeroas  keturesaad  have  ooc  yet  remted  evsa 
one  report  of  aa  observer  who  (Sd  acc  pertelvg  a  vnid  3^ 
shape.  We  beUe\e  is  a  oatuni,  eco3^Xs!Iy  «aSd  test  of 
the  shape  ideniificauoa  fooedoss  that  KDE  has  evohed  to 
perform. 


Under  nonnal  %ieui:^  no  leamisg  or  practice  xs  Deeded  b> 
percei>e  these  3-D  shapes.  The  critics*  waoa  that  practe  is 
needed  to  perform  the  ba^  task  is  wTor^  Howoer.  practice 
IS  helpful  for  seme  aspects  of  the  task,  (a)  Sui^ects  must  leam 
to  correctly  use  the  naming  convention  for  these  shapes. 

IS  usually  learned  in  just  a  few  views  of  sample  stimuH.  0>) 
All  subjects  remember  the  shape  of  the  osdllaung 
unprarticed  subjects  frequem^  forget  the  final  duccdon  of 
rotation,  \\lth  correaive  feedback,  they  (cam  to  report  both 
shape  and  motion,  (c)  Some  shapes  are  ddiberat^*  made 
quite  »milar.  For  example,  in  these  stimuli,  tup  adjacent 
combine  to  form  an  elongated  hilL  The  distinction  between  a 
two-  and  three-hill  configuration  might  be  ov'erlooked  by  a 
subject  who  did  not  received  feedback  of  the  correct  response, 
but  the  distinction  is  easily  learned.  However,  even  highly 
practiced  subjects,  with  feedback,  cannot  infallibly  discritni* 
nate  between  the  two  dinerenily  oriented  three-hill  configu¬ 
rations.  (d)  Small  amounts  of  image  degradauon  are  ea^y 
tolerated  b>  all  subjects.  However,  to  corrcctlj  identify  shapes 
when  the  number  of  surface  dots  is  grossly  reduced  or  when 
the  signal-to-noise  ratio  is  reduced  takes  practice. 


The  critics  suggest  that  KDE  expenments  be  conducted 
without  informing  the  subj’eci  about  the  correctness  or  incor¬ 
rectness  of  the  response  (the  3D  struaure  derived  from  mo¬ 
tion),  that  is.  pving  the  subj'ect  no  feedback.  By  eliminating 
feedback,  the  critics  hope  to  avoid  the  problem  of  the  subjects* 
learning  to  use  incidental  cues.  Wc  befieve  that  the  better  way 
of  deahng  with  incidental  cues  is  to  eliminate  them,  or  when 
that  IS  not  fearible,  to  mask  them  or  render  them  useless  by 
irrelevant  vanation  (see  below).  What  we  address  here  is  the 
larger  question  ofwhat  can  be  learned  from  experiments  with 
and  without  feedback. 

An  expenment  without  feedback  is  essentially  an  epidemi¬ 
ological  investigation.  It  investigates  the  current  status  of  a 
skill  that  has  been  acquired  prior  to  the  experimental  ritua- 
tron.  Therefore,  the  no-feedback  experiment  is  simplest  to 
interpret  when  the  current  test  is  identical  to  a  previous 


Ess  the  oqpsd  cxrsrc  ^  ci;pert::L3fl  srsa  be 
jsserpeaed  ga  tosa  cft&e  ease  cf g-graTrgraa  cfpgviiaa^ 
feararJ  siSSs  so  t£g  ae*  argsg  saadS,  Tba^  ao  tea  i£e 
^bSist  cf  gcbcs  io  £scsa  sdhSg  ssaea  framurs  n  boef 

Tcsifcs  pSecs  wSh  Per  53  ea-uAwa-dsa  sSaggx  wSetea  Eaa 
aSteSag  ttea  xa  cgyw;a.-j:y  p  gtaiaasg  Ss&2eSc, 
we^tJcscg^fec*tScrpg>ijg;.rfyac9r5g3s83Hg5aag- 
?Sred:ii»qcMagibe  jgyrtfrte  feggarasyjgeatetSeg 
perae^  Icanaed  KI^  a3i2l  oc  tbsr  a&S&y  to  as^=»  oew 
dcSs.  EcefiKk  apetesses  csadi  ts  «bzs  fcs=3=s  ca  23d 
<3‘*scg  tea  to  do— the  eff  hrr’.TS  pgSeesaars.  Bc- 

cagseap&TmgscswkhferdGarkpcgbetheBsascffperfcrta- 
23cs;  Hxv  ars  scfl  efeesaghg  foe  He  <Sg^gsy  ct 
pcixv  aetfa  alsas.  fa  po-fe^adc  opectea^  the  ca- 

kaewa  ttesrg  fcaarkxi.  aad  the  <Svcggacg  of  tart  aad 
tararag.  pose  pcoKeas  fee  ttec^rVal  araJvsa. 

JdedOhstnas 

Oaekiadof»avys;^iooth3stobegap3rtjc:dt.^c3jga>- 
aareaboct  hsmaa  coegrarkn  istbg  cccparisoa  of  fcasaa 
peribrtaaatg  itiSh  the  perfbcaaaeg  of  a  ided 

observer  (Green  &  Swess.  19^/1974;  Sperfiag  &  Dosbsr. 
19^).  ladeed.  the  cfSaeaQr  of  efoesarioa  esc  bv  haaaas 
rdarivr  to  ideal  observers  ts  of  peacrical  as  «e3  as  tteeericaS 
c:te;c3t.Traei^isformsDoaIossihrtK^tbes:3SssofseD- 
sxry  aaalv^  vvdds  iaxportaat  is^rts  iaia  sessocy  processs^ 
(og..  Griskf.  1939;  Psiub  &  SperE^  1957^  h  woedd  be  of 
great  interest  lo  know,  ta  aocse-perturbed  KDE  rEspSays,  what 
tbecfpoesscycf  bsaansh^iideafificaiioaisiarcfaSooio 
that  of  aa  ideal  obscrvtz.  Wbea  the  cfnftfacy  of  bcasaa 
per^ieua!  processmg  a  we  ss^xci  that  the  task  exposes 

processes  ttei  are  of  evoictjoaaiy  sgmficaace: 

To  compare  the  processegefTideoev  human  and  ideal 
observers,  we  tteed  to  specif'*  exactfy  what  each  land  of 
observer  know^  about  the  experimental  procedure,  sudi  as 
the  a  i^ion  krmwk^  about  the  probabiEtks  of  vxiocs  lands 
of  stimuli  and  the  gay-ofis.  Tl^  im;^ies  an  experiment  with 
exj^ert  feedback.  To  test  aad  evaluate  sophisricated 'models 
of  human  mental  computation  reqtures  us  to  brit  g  into  the 
labi^tory  much  of  the  trmning  that  often  h  assumed  to  have 
occurred  naturally,  and  it  reqmres  more  comidex  and  more 
explicit  laboratory  procedures  than  have  been  used  in  the 
past,  all  with  feedback  to  the  subject. 

Iturospeawn  Versus  Objeahe Measures  of 
Performance 

Eariy  pQ'chologists  such  as  Wundt  ( 1 905)  and  James  ( 1  S90r 
attempt^  to  distinguish  the  new  discir^ine  of  experimenul 
psychology  from  the  natural  soenccsby  emphariringdirferent 
meihodolo^cs.  They  were  especially  concerned  with  how 
things  appeared  to  them— intro^>ecuon~ralher  than  with 
measuable  skills  and  abilities.  An  imporunt  component  of 
the  development  of  psychology  has  been  the  move  away  from 
introspection— now  viewed  as  an  extended  verbal  report— 
and  toward  behaviors  that  are  simpler,  more  earily  measured. 
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coe&otatsttsSpeciowtbsxfessf&aiSsaasdxt^ 
capcfectsatcr, 

KDE  A&scztne  CKspctztxc^  zad  Arti&rtczl 
Ctopwziiacs 

Mceioa  Iryul  u>  KDE  Con^xcaica 

A  saaacay  ssfs:  ones!  •KSh  laadaca  dos  tppon 
<p^fii2.Assaa32sl!e5s&£5aasionce£.itspc)»vn3 
asla»ias&?c3>— gttl;Tyg«fcpQicSa(KlEt.Iiacse&i 
to  i2as3c  of  KI£  is  ta  stsap^  aiidi  is  Ue  penxp- 

veo  of  dtpeb  iadscod  by  tbs  <bspaiiy  ^Satsca  ieMta 
isxpsa&c'k&iaiit^eiXS.lSTCkazsiiSosytoaa- 
copfis.tbc<£spsitydiSbiascsbea<aad93iatKDsxECssne 
baas  of  oe;  laadoaxdot  <£s;^s  soSce  to  ssijuts  a 
pood  tapccsson  of  KDE  dqiicb  (laady,  Dosbtf,  ^etS::^  & 
Poldas,  i9SS;  Laady,  Spob:^  Poioas,  &  Dosba;  I9S7; 
Todd.  I9SSX.  Bccacso  oaly  sdooQ'  a  eoapaaUk  froa  t»n 
6aacs,  aod  col  aockiatiaa  (nf^  tcqsiics  three).  KDE 
peeeited  ia  t«o-fiaiae  displays  irajSes  that  acc&tz&aa  is 
not  seeded  for  KDE  StaEarfy,  coasijoeting  a  scqaesce  of 
frames  so  that  caeb  uafindaal  dot  has  a  fifetuae  of  only  tKO 
frames  yidds  lajb  shape  tdea^fkaaon  aceuraey  (Dasher, 
Laady,  &  SpetEnp.  19S9).  These,  and  other  of  oar  resalis. 
imply  that  a  \docity  lU»*fieId  is  a  suflicicst  stimulus  for 
KDE  The  idenaties  of  the  taosinp  dots  are  preserved  only 
long  caosph  to  yield  a  vdocity  cstimatr.  subsequently,  only 
the  veloeilies  asd  not  the  dots  themselves  are  needed  for  the 
KDE  suuctuie-from-molion  computation. 

The  human  pereeptual  system  uses  two  fundamentally 
different  compulations  to  extract  motion  flow-fidds.  First- 
order  motion  analy«  is  served  by  motion  detectors  that 
approximate  a  spaiiotemporal  Fourier  analysis  based  on  stim¬ 
ulus  contrast  (Addson  &  Betpen,  1985;  van  Santen  &  Sper- 
Unp.  1984.  1985;  Watson  &  Ahutnada,  1985X  Second-order 
(nenFourier)  motion  analysis  uses  mote  complex  stimulus 
properties  aod  invariably  involves  reetincalion  (op,  the  ab¬ 
solute  value  of  contrast;  Chubb  &  Spcrlinp  1988,  1989). 
Doshcr  et  al.  (1989)  sho»ed  that  the  first-order  system  was 
the  predominant  contributor  to  KDE  in  our  random-dot 
stimuli  (because  only  it  had  suffident  ^dal  resolution), 
allhouph  second-order  motion  computations  could  yield  b'm- 
ited  KDE  under  special  conditions. 


Ssxsar^Ftaa-ifaSaaCicB^aeasaa 

(£vea  fias  tbe  Ss-eodm  vcbdQr  Sba-SrSd  is  dbc  iicpits 
srarmiV'.sfeclvD^SKspgSSracagiptaEiMsSagsKyfidbe 
■pjxrKanrxtrartSDstapefeeaecerEPsaitrrS  Cca^ 
eraep  that  die  smeaS  «S3  nonsd  by  peaSd  poipssnsa  and 
«sdi  de  acs  of  sueatba  peqseadEraiar  in  the  aae«3>c  zep 
depstc^isdesaseas&cisociaapijaSacThefSmacs 
leoieea  laca2  o^ed  depdi  aed  do  xicaa  ot^  diepdb  is 
pcopectisadsodeiStSscaocbeomaaelocaiimatevdbc? 
a9ddeo«eaaiaapevdociey.Thes^ofdcprapacd»3S? 
ff-  ce  — )  at  sadcaasssted.  To  cffcnCTaatle  aaaoss  as  S3 
smssS,  it  is  Bcs  aceessmy  so  coBpBee  dqxb  ctxeyaheec  in 
iVifagf  ForrrBrvfc;.itna5assodcaeaeaedcp(a(vdae- 
i3)a:sbenbcara!nsia«h5dba  l.(Lor— I  coeddbetbeed 
fe^s:jrfcscoiasi.-og5aa.ASa53advd^,isi»odabes»f- 

firir^to WanerfeeietnrSfymrvam-vi-nnr-wjrri,—'..-,  v-,,V--r-^- 

afcees«afc>eddgriiiiCsrofsaetat.Taag3,andtogaeqotse 

srwei^sndgiefera—jdjo-rveiyuti-eeriv  (ApSy-oW- 

pealc  descr^eoc  woeld  tahe  the  vadsES  semraf  (siq^  peab 
csod  in  coesenxaoe].  e!ae^sx3  {the  crrdxarooQ  of  t*o 
peahs).  or  edaxfi  (the  mrahfrrtim  of  duee  peaL^-)  Af- 
dsoe^  the  precise  locabca  of  the  excemum  woedd  Ga  the 
dsesceofaaise)besu£5des:fi9raaideddraeesoeto£scsir:>- 
inaahet«eea  the  53  shapg.  human  observers  cse  the  aaoal 
sbapeintbesai{SIio.-hoodoftheexlremt!.-niatheirjadpsra 

KDE  Versos  Koo-KDE  Comp-MoiUms 

lnihesourreanit)e;B£prDposedaoiem:ccmoroo=pci- 
taiions  rar^irsp  bom  a  true  KDE  compstaiioa.  in  wfusii  3D 
sbapeassrpmnentisaccoapamedbytbeintro^ecdveimpres- 
sonord:pdistns:Sure,ioartifactca]computanons.At^*- 
tual  computations  can  be  chminatrd  by  appropriate  stimaigs 
snanipulaiiors.  The  more  ireubSesome  possib3iiy  is  a  KDE 
cbemaire  conrpnmrioR  that  is  a^ocithmitally  eqmvaSenl  to 
the  true  KDE  computation  and  may  share  the  same  motion 
inpms,  but  is  not  accempanied  by  the  inuosxesive  impres¬ 
sion  of  a  shape  ia  depth.  In  the  source  artide.  «e  provided  a 
tasL  in  vihkfa  the  sutjea  viewed  » isolated  patches  in  which 
dots  moved  al  the  vdocity  of  dx  key  locations  in  c»r  KDE 
distfays.  Performance  in  this  task,  wtnefa  did  not  involve 
iCDE  demonstrated  not  only  that  an  alternative  computation 
could  occur,  but  that  the  pattern  of  responses  and  errors 
produced  by  the  alternative  compulation  mirrored  the  re- 
^nse  pattern  in  the  KDE  task. 

Here  we  condder  perhaps  the  most  troublesome  posdbilitye 
true  KDE  supplemented  by  other  computations,  Suj^iosc,  for 
eiampte,  in  viewing  one  of  our  complex  random-dot  shapes, 
the  sul^  (a)  experiences  weak  KDE  and  sees  a  hill  and  an 
ambiguous  area,  (b)  observes  that  dots  in  the  hill  and  in  the 
amHguous  area  of  the  display  are  moving  in  the  oppodtc 
directions,  (c)  infers  that  these  two  subareas  represent  the 
opposite  depth  planes,  and  (d)  uses  both  sources  of  informa¬ 
tion  in  his  response.  How  can  one  deal  with  the  problem  of 
discovering  the  algorithms  underlying  ICDE  performance  in 
KDE  tasks  and  their  precise  implementation?  The  critics 
assume  that  sul^ects  naturally  use  KDE  in  KDE  tasks,  and 
that  by  not  giving  sub;ccts  feedback  on  the  correctness  of  their 


a  STSCUNCu  1. 0C5H3L  AND  M.  LiCOy 


4a 


KS^^ciEK&s&fi^sssi^sasIcznatiaBseae^SKSsiBcraflQe?- 

ex^csapsa&xa^ 

rrwwKwmti^  parfrf  saesai  ME 

f>^rr'rT»w-n<egBrfta!jtrgaMfflac4Cl^ 

&£S  S9  dea»e  scse:ars  Cso  sDstbsn.  NBa4CD£  csBBpirea* 

«ee&»&3s  S9  devSop  oeStodt  fio  aocwtra:  t^ea.  Besasae 
KD£  s  £E£aiE&&d  &SSB  2  KDE  a&e5£M  67  a  sa^sm 
ispcsste  cf  pBSrtcd  depdu- sa^KM  flEpcKOit  cs  a've- 
caagy  ccsgcgeja  ia  tte  prctyrwtfrBTs  annat  By  dt«er 

esfsss&cd  £cea  KDEaad  KDE  a&fscafia'W&esas 

f  p~«g»»^  «m  ct*' trp 

t3  agcpartfiPg  ffcsHao  ct  AT^^.Tnaff  asd  ^asai^ 
SSL  «e  as]  is  is  ascssBgy  10  aecacas  cbe  iacf>- 


spe^e  appccadli  sacses^  CESasaeas  ia  CEpeiao* 

^  procrd:^  o  230  fiscal  ove  cs^al  £csccs  as  tbey  as 
cSjcncred.  For  gus.v?tf.<£a  oer  ideadficidoa  asftL 
stages  of  eocai5fta£te  cocyfrVtsy.traVrs  arri&ffsri 
coesgetz^ocs  (s^da  tsfy  oa  sc^  g-mry.n  of  saddcBSd 
isSoRSsaoa)  alacsi  rsrirwi  Kergta^  sk^ia^  cars  £cdy 
bnsf  cnaae  a  rdzdvc  £s3dt3=S2e  fir  soa^  (vsscs  pttdH) 
car.pcaxaocs.  To  fenber  ^Krir.irare  be;  mesa  KO£  aad 
ooo-KOE  cocpr-^rkes.  (be  soeere  a:ix^  pcopcsed  dsd 
£2sbs  :bs£  sdsahdy  sserfisod  wib:  ibe  caeaul  peooessses 
resosrcs.  .jqssed  fixralieRattiito 


Muhilocaicn  ^fo(ion  Tasks 

Tbe  sosroe  ar^  ajpxd.  aad  (be.crica  i^ipesr  to  ^Tct; 
th22  (be  33-sbage  i&acSczxxKa  laslc  is  kss  scey^k  to 
anilkis]  coapcuiioss  (baa  eatSer  (asks.  Ko«rver.  (be 
c*nics  2spx  tfa^  lbs  <&(tactk)Q  bexuerc  (be  S^-sbape  task 
aad  pmioas  KDE  tasks  b  set  based  oa  aay  fusdaaseoia] 
pnaei;k.  bol  b  based  merd^*  oa  (be  nuaba  cS  locauoas 
froai  ufacb  ^ekmiy  ialbnaaiioa  mast  be  extracted  iq  order 
to  perfima  ibe  issk,  lo  rd»i(al.  ^  sbo»'  bere  that  the 
dbtioctioa  between  oae  or  (ho  xmscs  w  Jocaiions  b  critkal 
(because  citber  tbe  first*  or  secoadorder  csoiioa  s>stcm  can 
pro%ide  siaultaoeous  ^doerty  infonnafioa  about  oae  or  tv^o 
locations,  but  only  tbe  fint-order  s>'s;eni  can  (Ro%ide  infer'. 
maiion  about  ax]L  In  the  source  article,  «e  stwed  that  the 
structure  of  the  earlier  tasks  permitiM  information  from  only 
one  or  tut>  locations  to  dberiminate  perfectly  between  alter* 
native  responses  when  the  same  infomuuon  should  ha\e  been 
insuffident  to  construa  a  3D  shape  rrpresentauoa 

Dosher  et  at  (19S9)  showed  t^  complex  shape  identifi* 
cation,  based  on  motion  at  three  or  more  locations,  operates 
\ery  differently,  from  motion  extraction  at  one  or  twxi  loca* 
tions.  They  used  stimuli  that  were  designed  to  sclcctivdy 
stimulate  either  the  secpndKsrder  motion  .system  or  both  the 
first'  and  second-order  systems  and  compared  them  in  four 
tasks  the  53*sh3pe  idenufication  task,  a  threshold  detection 
task  for4notion  in  a  single  patch,  a  threshold  dirtetionof' 
motion  task  in  a  single  patch,  and  a  motion  segmentation 
task  that  required  finding  the  oneofoine  possible  locations 
at  uttich  there  was  an  odd  direction  of  motion.  Manipulations 
that  disrupted  firstorder  motion  information  (such  as  rapidly 
alternating  the  contrast  of  stimulus  dots  on  a  gray  ground 


Kict  amd  w&iiag)  kdutrf  jAgeSfearjaa  cf SB  sfcj^^ 

p£ad  ^  aeca  sesaeasamaa  cbsSl  Fsfixsasee  «3S 
aSszQ  to  sopftb&ca^  t&as  aeses  xdooy  isfixssise 

based  ce  caiy  coe  oe.two  iwarTciM  bi  cnrf.Tff.  draretjea 
P»  josSgsaezes  iB°sc2k 

ptfdto  rf  gfaeac  gtcteB  saswed  (Sgayi^  off  £gs<pag 


h&esaba^ 

Tecae  aae  sr»eal  ceasnat  v&y  actfiSctiai  cecsgesi^oess 


baaed  eaceecco>olxieaca!S  any  saoiiefiasjxdgr  actios 
dSexydstt  mfeca  KDE  cair.g»,',  rarinas  eagace.  VeSxsgr  jafec* 
ssaeiss  aboa  oae  or  two  ioca&aas  sa^  be  obCKs^  cs±er  ^ 
tmelag  iadgiidbal  ssage  fiaetstSy  cc  ^  gfixaasjco  pabc" 
estod  tbescgb  a  secced^eder  ^sscau  HswensL  sescffid-ecder 


(ggagfca5fi>2»eocfegwave)tbaeagfi^lrwofisSx=5aoo 
ia  friarlco  to  fitss-ceder  aaocion  (Cbcbb  &  SfcSjg.  193^ 
Spedkg.  19^  Espekai^.sccccad^eiSer  bSxsaSoo  b  tk' 
tsaSy  rsssSssed  to  fineaL  ad  e«ea  there  it  b  of  bw  spa^ 
rgscfcaoo(OabbA:Speg£g8, 19S3L  19S9).?Coess::;cys23y, 
wbea  tbe  rscs^esy  of  3D  sbjpf  feqeSmssimgb33fc.es  isibr' 
naSm  abort  sedoo  ia  Ibsee  or  more  bcadccs.  s(  b  ex- 


tmody%rfafrabiif  tol£sn^pc5oocffig^o^Sgp£iym^tioaL 
Tbe  Dosber  ct  at  (19S9)  rrydrs  am  ooe  cogsat  rmpcval 
nym;^cfmby  we  cake  tbe  cSstiactioo  between  tasks  that 
require  cbervaiioo  cf  ocly  ooe  or  twp  locadoesaad  those 
that  rcqsim  BorcL  The  5>sb^  KDE  task  has  been  dccos* 
stfaad  to  req:^  the  first-ocdefoodca^stgSL  lodged,  first' 
order  poaoo  appears  to  be  tbe  csaestialigpct  to  aB  complex 
KDE  ^sensaxesnioss.  Tasks  that  can  be  st^ed  by  extraeusg 
vdocaes  at  two  locatioss  do  009  ccccssarily  tmobe  chber 
KDE  or  tbe  first-order  sMXioo  Qsua.  Our  critiebsa  of  other 
tasks  has  beta  that  they  casly  can  be  sohed  by  computations 
that  am  kss  tbaa  the  whole  KDE  compouiioa  Indeed,  h 
would  be  a  step  forward  if  the  KDE  cxpenmecters  onered  a 
plausibk  KDE  computation  against  wbkfa  performance  in 
ibrir  tasks  could  be  msasumd. 


How  to  Deal  with  Anifactual  Cues 

Dot  Density.  An  ArtifactuaJ  Cue  in  KDE  Experiments 

The  source  article  investigated  structure  from  motion.  In 
thb  context,  a  ooomotion  shape  cue  such  as  a  local  variation 
in  2D  dot  densty  b  ai'tifanuaL  (Of  course,  in  a  shapC'from* 
mturt  task,  lexture-dendly  cues  would  be  pnmary.)  We 
eliminated  static  (so^C'fr^e)  density  cues  by  reampling  of 
stimulus  dots  when  cecessary  to  m^ntain  a  uniform  2D 
dend^  on  each  successive  frame  of  the  motion  stimulus,  as 
foUowx.  Tbe  stimulus  field  was  divided  into  100  fixed  areas 
of  «}ual  aze.  Each  area  was  constrained  to  hava  three  dots  in 
evaiy  frame.  Tbe  3D  motion  of  the  surface  between  frames 
causes  ^is  to  wander  in  and  out  of  areas,  some  areas  having 
a  net  infiow  and  others  a  net  outflow.  Therefore,  to  satisfy 
the  constraint  of  having  a  constant  number  of  dots,  dots  wm 
added  or  subuacted  at  randomly  chosen  locations  within  local 
areas.  The  fraction  of  new^plus-discontinued  dots  divided  by 
the  fixed  number  of  dots  in  an  area  b  the  sqntillation  fraction 
Our  displays  typically  required  5%  frame-to-frame  scintilla* 
(ion. 
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la  the  9oeRe  artids.  GDasssd  fSse  dess^  fcy 
pcotixlss  a  (Sspizv  tfea  bid  ao  ocdoa  cce.  cdty  the  ex- 
taasdd£ssiQrc9£.U^&de&^caedoex;odir  1  cfoer 
3  s^^scs  se&amd  a&ovc*<&aa:c  s&^  ites&fi^a^oa.  Oa  tbe 
ocher  laadlyiihtbeiDadbocae  (mg£gea*&g  <Ssa^ 

an^&a)L  idesaSaSoQ  alaon  as  saod  'as 
incta3Qaadd"."i!gyopestoettbef.TfeesS^fm9aLnr>^ 
fcsbi&fy  dse  to  ib?  srirtiy.y:)rCT  tto  accocjtf  sVJ  fgaovaJ 
of  Ibe  dssssy  a:a&c3. 

SaefUiofwo 

Tbs  ends  pccsi  oefl  iha  oer  pstbod  of  r fatmlrg  the 
dessoy  CSC  iaaodocts  a  a^ofrspedSc  tanadoa  xa  sasa2»< 
doa.  mtada  is  isdf  a  possbSc  coe  in  a  afe^  i(lcn:tfya!ioQ 
xesi.  Tb^  pcsssSie  arofe^flg  pcasttcg  aad  dhaiaartgg  figd» 
bade  as  the  rrztaiy.  Arc  rrs&kTrggtbgopportcatx-  to  pcactxe 
aadfferfaa&agfcfdbadcthgopdsaalcagthodsofdeal^wth 
aniC^cd  cats?  Whaisver  tbs  de$res  ofpnctks  or  Cxdbadc. 
«r  bcSnr  that  xx  is  beosr  dihs  to  detensiae  the  posabSe 
cfisctheasss  of  s^ficasi  s:Ji£sas  or  to  ^minate  tbssu  as 
ue  did  the  deasiiy  CD£.  SstpZy  asscxssagthsi  bck  oTpeanke- 
uiQ  sslHce  is  oot  adsQsste.  lodeoL  the  possbilhy  that  »t> 
aSatSoa  \an2tioss  t:^t  be  a  shape  coe  uas  cocddered  ia 
the  soorce  ankk;  rt  »as  disaassed  bttsttse  sdatilhtion  is  aa 
e%'ni  vtaXer  ose  ibsa  the  estfexnd>‘  utsk  desshy  cce.  That 
is.  «hea  ^sfitys  are  coasai»cied  «iihoot  tnodon  cues  but 
uTib  otfy  a  dra^y  cue  or  a  sdntiQadoa  eve.  h  obvious!)*  is 
harder  to  percehe  areas  ufoere  sdoxination  dilTers  frost  the 
av-erase  tbaa  uhere  deony  difTers  frost  the  averafe  (Lapsxn. 
Doner.  &  Konas.  19S0).  This  is  because  random  error  la  dot 
density  as  a  direct  eve  to  3D  sk^  occurs  oah*  because  of 
sampiin^  error  (limned  number  of  dots)  and  the  fineness  of 
the  182  X  lS2puel  grid,  whereas  random  error  ini  ‘^lion 
ofsciniilbuon  to  a  frame^to-frame  change  in  3D  sk»^  occurs 
because  of  the  coarseness  of  the  10  x  10  grid  of  local  areas 
uiihin  uttch  densty  vvas  kept  constant  Thai  is.  because  of 
the  uay  stimuli  um  constnicted,  sdniillation  densit)*  uas  an 
objectively  less  reliable  cue  to  shape  than  was  dot  dertsu. 
live  the  ^iKlen^y  cue,  the  sdntillalion  cue  in  our  dt9la)*s 
can  be  measured  alone  and  it  can  be  compensated  or  masVed. 

E^^lav's  were  constructed  that  had  a  pure  sciniillaiion  cue. 
without  the  chaupng-densily  or  motion  cues.  The  only  subject 
who  was  able  to  perform  above  chance  with  the  isolated 
density  cue  also  viewed  the  new*  di^la)S.  V^lth  a  pure  sdnul> 
lation  cue.  it  was  clear  that  his  performance  in  a  shape 
identification  tasV  would  have  been  even  lower  than  that  with 
a  densty  cut.  although  we  did  not  feel  it  was  worthwhile  to 
run  a  formal  experiment  Conversely.  displa>'s  were  con* 
strutted  with  normal  den»ty-€ontrolIed  KDE  ^es.  but  with 
extraneous^'ntillation  added  uniformly  throughout  the  dis¬ 
play  to  mask  the  scintilbtion  cue.  Shape  identification  in 
these  dUpla)'s  (with  the  scintillation  cue  rendered  inelTeciive) 
appeared  essentially  equivalent  to  normal  KDE  displa>'s. 
However,  adding  extraneous  scintillation  reduces  the  ugnal* 
to>noise  in  the  stimulus,  and  more  extended  observations 
undoubtedly  would  reveal  a  slight  impairmem^not  due  to 
the  loss  of  the  scintillation  cue  but  to  the  added  scintillation. 
The  bottom  line  for  displa>^  that  are  not  sciniillation<or* 
rected  is  that  the  residual  sdntillauon  cue  could  be  used  to 


xsaVs  saoe  extreaefy  co^  (Ssshmaadoos  (eg.,  diere  prob* 
abiy  is  more  scxaaSbdoo  OD  the  kf:  than  OQ  the  s^hi  the 
rEtph))  that  may  support  ^ovg-dbtpce  shape  idgs^SatioB 
lor  oirerndy  sogifcg^atgd  subjects;  it  xs^noe  a  ^"ifeast 
&cior  mheo  aoema!  so6oa  cues  are  avml^je. 

Genad  Prccedurafor  ikaJins  niih  Anifsdual  Cua 

TheprocedsresosedtDiDcascreafldtotfintfnafe^tbsdot* 
dea^aadscistillaaoQaresStesaaiegeaeralpria^fe^lfa 
parTtmlar  ani£Kl  yidds  abovo^iaaa  gocsicng.  (a)  measure 
the  stresg^  of  the  aitdksxtal  cue  m  isobtioD  and  (b)  ooQStruct 
^Espb>5  to  wtiefa  the  eve  is  ^misaicd.  masked,  or  reodoed 
csdess  by  BteSevaat  vanaiioD.  OcaiiDg  the  cue  in  isobtioo  is 
useful  becasse  boondls  OD  the  posaiUe  sucagih  of  the  eve  can 
then  be  detensined.  For  example;  dot  de3sit)'aod  scinullaik)ti 
were  extreme^  weak  eves.  Elhmsatii^  axtilactDal  eves  in  the 
£spbys  of  interest  is  an  ideal  sohi^on.  but  is  not  alwavs 
possHe.  Thus.  doiHSerazty  eves  could  be  e&nisated.  but  this 
introduced  scintilbiion  cues  that  could  not  be  dimisated  but 
could  be  masked  1^' adi^tsg  stiQ  mm  scintillationu  It  was  not 
necessary  to  use  the  tHrd  general  method  of  deaSng  with 
unwanted  cues— muododng  iirdevant  variation.  For  exam* 
pk.  irrdevasi  variation  is  used  to  dinunatc  motion  extent  as 
an  artifkt  in  vxlocxty  cstisratson  (McKee;  ^'crman.  &  Ka* 
ka)ama.  19S6). 

In  our  experiex^  h  has  nev*er  been  mccssarv*  or  prefemUe 
to  deal  with  possible  anifaciual  cues  by  usng  naive  sutjects 
without  feedback  and  hoping  that  the  sutjem  do  oot  use  the 
aitifsaual  cues.  To  review*  our  previous  discussion;  the  prob* 
km  is  that,  for  optimal  performance,  subjects  must  also  learn 
to  c^nima!^*  use  the  reliant  cues,  and  this  requires  praaice 
with  feedback. 

Summary  and  Conclusions 

1.  The  extraction  of  2D  rdative  vrlodty  b  a  bauc  substrate 
for  deriving  3D  structure  from  dviiamic  visual  stimuli  for 
both  the  true  KDE  or  KDE-altcmative  computations. 

2.  The  53'Sh3pe  lexicon  for  our  idcniiHcation  task  presents 
an  ecolo^callv  valid  test  of  shape  recov  eiy’  (KDE)  for  complex 
depth  surfaces. 

3.  Practice  in  the  53'Shapc  task  serves  to  optimize  identifi¬ 
cation  performance;  however,  practice  b  not  necessary  to 
immediatdy  perceive  vivid  KDE 

4.  Properly  conducted,  experiments  with  feedback  can  mea¬ 
sure  the  limits  of  human  capacity;  experiments  without  feed¬ 
back  measure  the  alrJity  of  subjects  to  generalize  from  their 
past  experience  to  the  experimental  stimuli. 

5.  Excluding  feedback  in  KDE  experiments  does  net  elim¬ 
inate  the  possibility  that  artifaaual  cues  may  generate  a 
correct  reH>onse,  it  merely  confuses  the  issue. 

6.  Scintillation  b  an  insigniHcant  cue  in  the  53'Shape  stim¬ 
uli. 

7.  Deriving  high-resolution  3D  structures  from  2D  d)*namic 
dispb)5  requires  the  first-order  motion  processing  s)stem.  In 
moving-dot  displa)^  such  as  ours,  the  second-order  motion 
S)‘stem  cannot  be  used  to  solve  tasks  that  require  simultaneous 
access  to  V'clocities  at  more  than  two  locations.  Therefore,  (o 
bolaie  the  KDE  performance  supported  by  nm-order  motion 
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•Die  recent  disclosures  of  fraud  in  the  conduct  of  research,  reporting  of  research,  or  both  in  a 
number  of  scientific  disciplines  have  prompted  a  widespread  program  of  self-examination  of 
publication  pracuces  and  etiucs. 

The  editor  joins  with  APA  in  reminding  authors  of  the  principles  of  good  publication  P^c- 
tices  and  scientific  condua  Prospcai>e  authors  arc  directed  to  the  Publication  .\fanual  of 
ike  American  Psychological  Associaiion  (3rd  ed.)  and  to  the  "Instructions  to  Authors 
printed  in  this  issue.  The  reauircmcnis  of  data  ai^labiKty.  rcrlicabiliiy.  authorship  credit, 
ethical  treatment  of  subjects,  and  primary  puWicaiion  of  data  arc  important— they  are 
meant  to  ensure  responsible  science  and  appropriate  use  of  scarce  and  valuable  resources. 
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(l)compuitns  reimotopic  image  motion.  (2)  com*  punut's  hands  land  on  (he  piano  but  on  a  wrong  «  moiionduringMccadcJiinotmcfelyahabUbulan  sensations  of.,motlon  re*lnterpretcd  during  safr 
puting  object  spatial  position,  and  (3)  a  decision  note  Ofdmanly.  the  pianist  does  not  entertain  the  unmodifiable  deficiency  would  requite  the  experi*  cades?”  remains  unresolved. 
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Textere  intetacfions  defemune  perceived  contrast 

<igatgiitnMw/gcrqqtfiiB^/lte3iiiffi&a&a< 

Cfim£5  Gztttcc  SrCatCdG.  AN0  JOSmfA  A.  SoiOM^ 

** ^ IV-ji-^nT  It «f  Ne»  tmX CawEStfS.* B«»- 

CMOrSofifif  ^  Grwpr 


ASSTS,^T  Firapific&cfraBAwitifaaJlctfxccadbe^ 
&iS  n  a  sacrwCis  fciJLgrwurf  tC  tmifar  Irtfarc,  «e 
AaowaCfate  Ifat  Ibt  ycmhci  cwtoat  (te  kxter  poldb 
AscrnSgscftouiiaaaffmtagCTaCxartrfiielijdL,yiqaA.1^^ 

lie  teiare  pfich  it  5anwB8<tJ  ty  Msjb<*atrgl  legartv  t>e 

tg^^praSafl^efcaiarcfuCA  jypur  Jiiwtr.a>Jskw4- 
tnciiasif,  its  darl  petes  apfcar  kss  iftcl  fkia  afeea  «S  is 
sarnoaiStd  I7  a  tauSicai  facft^SrivaA.  The  MaenS  fcdadSsa 
cCa^spvte  cterzsS  is  $ra<^  doBinkerf  «fcdi  iO  fester 
paSck  aad  taci;T*ead  are  Gfecred  tea  MMOvofaffias  spaSsI 
frt^Gearr  baais  «r  (i)  Ike  tteare  pilck  aad  fca4;;^a«nd  are 
prcseaerfl»gItr«Mqes.OarrLraftUareOTia>ic^»*iedtn-al 
csmaSlke«eics«fli:klatsspercrpekiiaadpeteteapBrBep> 
taai  flDtcfaaaai  te  eterail  gaia  eteni  eccani^s  al  aa  carlr 
ceetical  or  prtCMiJcal  anral  IXBS. 


of  a  cMAeal?  harjj.irfl  &sc  t-jcavd  00  a  Kirpt  ezefora 
ssgTocafeg  hacigrocad  ^epeads  po<  iSrecdv  00  the  fcs^ 
fxtase  of  cbe  ^sc.  ha  Riber  <n  tbe  1^230  of  Ssc  hsamace  t;» 
bacigfpcad  lr;r»nT»,r  £%o  a  spacePy  resxnsccd 
txxkpiTced  afloxs  poxm^  l^hoas  as  £&straciJ  tbe 
iSsMOQ  sbena  n  Fi^  1  o  aad  Tbei&scso  Fi$.  loaadb 
are  eqcsSssnsai,  ooortbdcss.  ihe  c»c  ta  a  appears  faster 
|]sa  dbc  a  b.  Ihs  pbexKKscfXKS  of szoaJAMroKs  coRirasr 
IS  fiKerprezed  la  lenns  of  a  mio  ru3e  b>  oocta$(fa2iiaolbe 
i3Zx>  of  ibe  d»cs  te.-nimacc  10  tadcjrocad  liiminanre  is 
ffcader  thaa  1;  sa  b.  (be  mta  is  less  than  1. 

Lateral  foKbclkin.  A  oaxaral  »a)’  to  explain  sswltancoos 
contrast  is  in  (enas  oTfefrra/  uJubaioa.  Many  nwdefs  based 
on  btetal  inhibition  has*c  prc^csed  that,  at  socte  les'd  cf 
sistal  processes  oeurcKts  strcK)$Iy’  stiomhled  by  (be  hi^b' 
intensily  bacipoond  of  tbe  disc  m  Fip  16  suppress  tbe  less 
strrog])  stnaalaicd  dcutors  responding  to  (he  interior  <j(  the 
disc.  In  Hg.  Ifi,  (be  conesponding  neurons  uithm  the  disc 
reveise  no  suUi  inhibiuon  from  ibc  weakly  >iunulaicd  neu* 
roits  surrounding  them.  Consequently,  (he  neurons  located 
within  (Ik,  dtsv  of  Ftg.  lu  respond  more  tigoruusly  than  ibai 
counterparts  in  16. 

Under  tbe  crudest  laleral  inhibition  modd.  the  li^lncss  of 
a  psen  point  in  tbe  osual  fkld  would  be  suppressed  in 
proportion  to  the  intensity  of  each  nearby  point  il).  But  such 
dsJieme  would  result,  for  example,  in  lower  lightness  lalues 
for  points  near  the  edp  of  the  disc  in  Fip  16  than  for  points 
in  Its  interior.  The  fact  (hat  both  discs  in  Fip  1  a  and  6  appear 
to  be  of  uniform  liighlness  across  their  full  expanse  suggests 
a  more  complex  form  of  lateral  inhibition  (4>.  Regardless  of 
(hdr  details,  all  modds  that  insoke  the  prindple  of  lateral 
inhibition  rest  on  (he  assumption  that  (he  primary  factor 
determining  the  perceived  lightness  of  either  disc  in  Fig.  1  a 
or  6  iv  the  ratio,  at  the  disc  edge,  of  disc  luminance  to 
background  luminance, 

Tte  puHjcaCioa  cous  of  f  htv  aiiicSe  wtre  defrayed  n  pan  by  pa$e  charge 
paymcAi.  Umamciemim  therefore  be  hereby  marked  ad\fnucmtfu 
m  accordance  with  iS  t-S  C.  f  1734  solely  to  tadicaic  lh»  fact 
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rSscefF*^  It  asbcggsocafwfeaa  hrgbfr  acoauatftbaatbe 
texisre  J»c  a  Bg.  1^.  desptt  (be  Csct  that  Ibe  two  4&SCS  are 
kdas^>cil  (n*e  desenbe  a  stroeger  foeta  cf  tbe  Sbaioo  bekrr.) 
Tbebri^pcxdstatbetexxcreifiscof  F^  Ic  appear  br^fcto^ 
Ifiaa  their  cousurpans  a  d.  aod  snohaneoatsly  the  dark 
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ad. 

For  each  cf  tbe  cSscs  io  1  c  asd  d  the  averay 
(Sffgeocccafciavaaaxrealtbcbocdcrbctwecatbcdistaaditi 
backpccad  H  0  (except  for  rasdoci  fluctuziions).  lo  fact, 
evny  ss^  pixd  ia  F^  1  c  and  (f  has  aa  expected  laiazaaacc 
eqtal  to  tbe  exaa  huninanre  Tbercfcre.  except  for  random 
flofiuatiofis.  aay  two  areas  of  Bg.  1  c  andJhav*e(besas)c 
av  erage  hiaanancc.  and  any  coRsisteol  dincreoce  in  zppezr- 
ance  betwoeo  (be  discs  of  Fig.  1  c  and  J  cannot  be  accounted 
for  by  standard  (haninance-based)  li{d)(nes$  models. 

EXPERLXIENTS  1  AND  2:  CO.VITIAST  AND 
LIGHTNESS  LNDUCTION 

.Method.  To  compare  Fig.  1  c  and  d  most  observers  shift 
(bar  eyes  back  and  forth  between  tbe  (wo  texture  discs.  To 
produce  a  stronger  version  of  (he  texture-contrast  illusion 
(hat  does  not  involvceye  movements,  we  use  just  Fig.  Idand 
Modo^aie  (be  omlrasl  of  the  background  texture  sinusoidally 
in  (imc  between  extreme  contrasts  of  0  and  1.  In  addition,  we 
produced  new*,  independent  realization  of  the  random  pattern 
instantiated  by  Fig  id  60  times  per  second.  This  produces 
(O-Hz  texture  Hickcr  over  the  whole  field,  but  it  diminatcs 
any  f^ural  cues  and  renders  negligible  any  effects  of  eye 
movements  on  (he  spaliotcmporal  frequency  content  of  the 
retinal  stimulus.  The  slow  contrast  modulation  of  the  bavk 
ground  causes  subjects  to  poceiv  e  the  contrast  of  the  texture 
disc  to  be  modulating  m  antiphase.  When  background  con 
irasi  fs  high,  texture-disc  vontrast  appears  lo  be  low .  and  vice 
versa. 

We  used  two  nulling  cxpcnmenls  to  measure  (he  induced 
modulation  of  the  apparent  lightness  of  both  the  dark  anJ 
bnght  pixels  of  the  texture  disc.  In  the  first  nulling  expen- 
ment.  subjects  viewed  (he  texture  disc  while  (he  contrast  of 
the  surrounding  background  was  being  sinusoidally  modu¬ 
lated  (at  0.47  Hz)  between  0  and  1.  Simultaneously,  the 
contrast  «.f  Ihe  center  disc  was  modulated  m  phase  with  the 
backgrmind.  The  mean  luminance  of  the  texture  disc  was 
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(boasd2f>'coa:na).TheRiio«ftbelg»nznce^^tsctotaclisrot«}JHsrc3!crthan  1  for  a  and  less  ihan  1  for^.  Alihoash  discs  in  a  and 
hase  tbe  sasic  luainancc.  that  n  a  armors  fishter  than  that  tn  b.  (t  and  <0  Induced  conitast  reduction.  Lile  the  nxan'luaunanl  discs  in  a  and 
h.  the  tcsiure  discs  mr  and  d  are  id^ical:  each  ts  of  contrast  *0.5.  Because  of  the  km-er<onirastback{TOund.  the  disc  in  r  appears  lobecd 
l^jber  contrast  than  that  in  d. 


kept  constant  in  time.  Subjects  adjusted  the  modulation 
amplitude  c*f  the  disc’s  contrast  until  disc  contrast  appeared 
constant  in  time. 

The  purpose  of  the  second  expenment  Vk-as  to  determine 
sshether  or  not  there  uUs  a  modulation  of  texture-disc  os  erall 
lightness  induced  by  modulating  the  contrast  of  the  texture 
background  Accoidin^y.  the  contrast  and  the  mean  lumi¬ 
nance  of  the  texture  disc  were  simultaneous)  modulated  in 
phase  uith  the  background.  The  modulation  amplitude  of 
texture-disc  contrast  uas  fixed  at  the  level  (determined  for 
each  subject  in  the  first  experiment)  at  which  the  induced 
contrast  modulation  was  nulled.  Then,  subjects  adjusted  the 
amplitude  of  texture-disc  mean  luminance  modulation  until 
the  overall  lightness  of  the  disc  appeared  constant  in  time. 

All  displa>*$  were  viewed  binocularly  from  a  chin  rest  at  a 
distance  of  1  m.  At  this  distance,  the  texture  disc  was  1.35* 
in  diameter  centered  in  the  3.6'  square  background  texture 
Held. 

Results.  Wc  tested  texture  discs  with  mean  contrasts 
ranging  from  0.2  to  0.5,  and  for  all  (o  the  induced  contrast 
modulation  of  the  texture  disc  was  substantial,  while  t»)  the 
induced  overall  lightness  modulation  was  ne^igible  Thus, 
modulating  the  contrast  of  the  texture  background  induces 
joint  modulations  of  the  apparent  lightnesses  of  dark  and 
bright  pixels  m  the  texture  disc-— joint  modulations  that  arc 


canceled  by  equal  and  opposite  modulations  of  the  lumi¬ 
nances  of  dark  and  bri^t  pixels  in  the  disc. 

The  magnitude  of  this  illusion  is  illustrated  graphically  in 
Hg.  2  for  a  mean  texture-disc  contrast  of  0.4.  The  sinusoidal 
broken  line  gives  the  contrast  of  the  background  as  a  function 
of  lime.  For  a  texture  disc  whose  mean  contrast  (over  time) 
is  fixed  at  0.4.  subjects  found  it  necessary  (in  the  ^  nulling 
expenment)  to  modulate  texture-disc  contrast  in  awordance 
with  the  solid  line  of  Fig.  2  in  order  to  make  texture-disc 
contrast  appea*’  constant  in  time.  Thus,  the  texture  disc 
appears  to  remain  at  a  constant  contrast  (as  shown  by  the  flat 
broken  line  of  Hg.  2)  when  its  contrast  is  actually  modulating 
in  conformity  with  the  solid  line  of  Fig.  2.  The  amplitude  of 
this  nulling  modulation  (averaged  for  two  subjects)  is  45%  of 
the  texture  disc's  mea.i  contrast.  Similar  data  were  obtained 
in  other  conditions. 

EXPERIMENT  3:  INTEROCULAR  INDUCTION 

Method.  Is  the  induced  modulation  of  texture-disc  appar¬ 
ent  contrast  the  result  of  an  early  or  a  late  visual  process'^  One 
way  of  inv  cstigaling  this  question  is  to  see  whether  or  not  the 
induction  can  occur  across  dincrcnt  eyes  Intcrocular  induc¬ 
tion  implies  that  the  neurons  responsible  for  the  induction 
muM  be  at  the  level  of  the  cortex  or  a  higher  visual  center 
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Stnctl>  oxMXKubr  taductioa  tmptks  that  tbe  locus  of  tbe 
iacucuoo  ts  as  early  cortical  or  prsccrtical  cdl  popubtiorL 
Acconlin^y.  cr  perfoesed  a  tl^  experimeot  ta  wf^  the 
inducing  ba^ground  was  ddivered  to  one  eye  and  the  test 
dtse  to  the  other  e^e.  Again  ve  used  the  method  of  adjust' 
ment.  There  were  four  kinds  of  trials:  (0  both  disc  and 
background  were  presented  to  the  right  e)*e:  (it)  both  were 
presented  to  tbe  left  e>’e;  (w)  the  left  eye  saw  only  the  disc 
and  the  right  e>‘c  saw  only  the  background,  and  (iv)  tbe  right 
e>e  saw  tbe  disc  and  the  left  eje  saw*  the  background. 
"V^Ticneser  a  r^son  of  one  eye's  retina  was  presented  with 
texture,  tbe  corresponding  region  e(  the  opposite  retina  w'as 
presented  with  uniform  mean  luminance. 

To  minimize  binocular  risalry,  we  used  tbe  following 
presentation  sequence:  The  texture  disc  (which  was  1.1*  in 
diameter)  was  (l^hed  penodically.  Each  Hash  lasted  133  ms, 
and  (lashes  were  separated  by  ^CO-ms  periods  of  uniform 
mean  luminance.  Two  of  (lashes  were  alternated: 
background'on  flashes  arid  background*olT  flashes.  On  back- 
groutid-on  (lashes  the  texture  disc  was  surrounded  by  a  (2.9*) 
square  texture  background  of  contrast  1.  On  background-ofT 
(lashes,  the  texture  disc  was  surrounded  by  a  background  of 
contrast  0  (i.c.,  a  uniform  mean-luminant  field).  For  some  3, 
under  the  subject's  control,  the  contrast  of  the  texture  disc 
was  0.4  -f  5  on  each  background-on  (lash  and  0.4  ->  d  on  each 
background-off  (lash.  On  each  trial,  the  subject  adjusted  6 
(which  was  randomly  initialized)  until  the  contrast  of  the 
texture  disc  on  background*on  flashes  appeared  equal  to  its 
contrast  on  background-off  flashes. 

Results.  Virtually  identical  data  were  obtained  for  two 
subjects;  the  data  for  one  subject  are  shown  in  Fig.  3.  On  the 
trials  in  w  hich  both  texture  disc  and  texture  background  were 
presented  to  (he  same  eye  (cither  both  to  the  ri^t  eye  or  both 
to  the  left),  subjects  bad  to  make  the  contrast  of  the  texture 
disc  40%  higher  on  (he  background-on  presentations  than  on 
(he  background-off  presentations  to  equalize  the  apparent 
contrast  of  the  texture  disc  across  alternating  background-on 
and  background-off  presentations.  However,  when  texture 
disc  and  texture  background  were  presented  to  opposite 
eyes,  no  such  compensating  adjustment  was  required  Wc 
infer  that  the  contrast  of  the  texture  background  influences 
the  apparent  contrast  of  the  texture  disc  only  when  disc  and 
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background  are  presented  to  the  same  eye.  This  finding 
restricts  tbe  pJiysiolt^ical  location  of  the  mechanisin  under- 
tying  this  induction  to  an  early  cortical  or  prcconlcal  neuron 
pc^latioo  (5«  6). 

EXPERLMENT  4:  INDUCTION  BETW'EEN 
SPATIAL  FREQUENCY  BANDS 

Method.  In  a  fouith  experiment,  we  examined  whether  or 
not  texture  (lllered  into  one  spatial  frequency’  band  could 
influence  the  perceived  contrast  of  texture  in  a  difTcrenl 
spatial  frequency  band;  that  is.  whether  contrast  induction  is 
narrowly  or  broadly  tuned  for  spatial  frequency.  We  spatially 
fiUered  the  texture  of  the  disc  through  an  ideal,  octave-wide, 
nonoricn!^  (liter.  The  background  u-as  (iUered  by  one  of 
three  adjacent  octave-wide  filters.  The  middle  background 
filter  was  identical  lo  the  texture-disc  filler  (the  frequencies 
passed  by  this  (liter  were  between  5.8  and  11.6  cycles  per 
degree  at  a  viewing  distance  of  1  m).  Examples  of  each  of  the 
three  textures  arc  shown  in  Fig  4. 

Results.  The  results  for  two  subjects  are  shown  in  Fig  5. 
For  both  subjects,  the  largest  contrast  modulation  is  induced 
when  the  background  texture  is  the  same  as  the  disc  texture 
When  the  background  texture  is  in  an  adjacent  octave-wide 
band,  either  one  octave  above  or  one  octave  below  the  disc 
texture,  the  induction  is  much  weaker  for  both  subjects. 
These  results  show  that  the  reduction  in  apparent  contrast  of 
a  disc  induced  by  a  textured  background  is  spatial- 
frequency-specific.  Preliminary  investigations  into  oncnla- 
tion  specificity  indicate  that  when  an  oriented  background 
texture  is  not  in  the  same  orientation  as  (he  disc  texture,  its 
influence  on  the  perceived  contrast  of  the  disc  texture  is 
diminished. 

DISCUSSION 

The  results  of  the  fourth  experiment  suggest  that,  at  some 
level  of  visual  processing,  neurons  tuned  to  roughly  a  single 
octave  (or  less)  m  spatial  frequency  interact  across  space 
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uith  susSaily  tooed  oeuroos.  Taken  tosetber,  our  results 
support  a  nxxld  to  ulaeh  the.  outpext  pin  of  such  a  band- 
neuron  ts  normaloed  rdath'c  to  the  a\‘er^  re^ 
spottsc  amplitude  of  nearby  neurons  uith  the  ssenc  frequency 
tuning  Neurons  dilTenr^  in  frequency  tuning  by  ox>re  than 
an  octave  have  much  less  influence  on  each  otho*. 

Sesetal  ins*e$tiptors  have  reported  lateral  inhitntory  in' 
teraetjoas  between  adjacent  complex  stimulf— for  example, 
between  textures  of  dilTerent  spatial  frequency  (7).  betuieen 
lines  diflenng  in  orientation  (8. 9).  and  betuecn  diflerent  local 
veloaties  (10).  The  interactions  base  been  small  because 
these  paradigms  required  (he  (uo  stimuli  to  dilTer  in  (heir 
critical  dimension,  spatial  frequency,  orientation,  or  veloc¬ 
ity.  In  a  precursor  of  (he  present  paradigm.  Sap  and 
Hochstetn  (11)  used  a  grating  whose  contrast  was  spatially 
modulated  analogously  to  luminance  in  the  Crmk«0'BrierK 


Comsweet  iOusson  (12)  to  provide  esidencc  for  lateral  tex¬ 
ture-contrast  inhit»tioo.  However,  their  dispby  did  not  per 
mti  measurement  of  the  eflect.*  intemeuron  texture  inter¬ 
actions  ha\  e  also  been  proposed  on  the  basts  data  obtained 
in  searching  for  a  target  among  distractor  items  (13).  Pre¬ 
scient  though  such  a  theory  may  be.  the  data  thcmselscs 
admit  other  explanations  and  provide  only  indirect  indica- 
tKMis  of  texture  interactions.  Thus,  the  present  expenments 
dhistrate  a  kind  of  robust,  spatial,  feature-spccinc  interaction 
that  IS  (/)  similar  to  gam  control  as  observ  in  phy  siolopcal 
experiments  (14)  and  (i7)  anticipated  in  (be  explanation  of 
complex  search  tasks  (13).  but  that  has  not.  to  our  knowl¬ 
edge,  been  unambiguously  observed  before  with  simple  tex¬ 
tured  stimuli  in  a  psychophysical  seilinp 

SUMMARY  AND  CONCLUSION 


Fig  5.  Induction  of  lexlurcKlisc  apparent  contrast  is  narrowly 
tuned  for  spatial  frequency.  A  nulling  procedure  was  used  with  the 
stimuli  of  Fig.  4.  Ordinate  indicates  the  difference  in  contrast 
between  a  texture-surrounded  lesi  disc  tof  contrast  0  -4)  and  a  lexiurc 
disc  matched  in  apparent  contrast  to  the  test  disc,  viewed  against  a 
uniform  grey  background.  Abscissa  indicates  the  spatial  frequency  of 
the  background.  Symbols  indicate  data  for  each  of  two  subjects,  each 
point  is  (he  average  of  the  last  10  reversals  of  a  staircase  .Measure¬ 
ment  error  is  approximately  equal  to  symbol  size.  These  data  suggest 
that  induced  contrast  reduction  has  approximately  a  one-octave 
spatial-frequency  bandwidth 


We  have  demonstrated  the  dependence  of  the  perceived 
lightness  of  a  point  in  space  on  lateral  texture  interactions  in 
(he  visual  display.  The  perceived  contrast  of  a  patch  of 
(cxiure  is  dramatically  influenced  by  the  contrast  of  sur^ 
rminding  texture.  In  particular,  for  spatial  texture  in  a  certain 
frequency  band,  the  perceived  contrast  varies  inversely  with 
the  contrast  of  surrounding  texture  in  the  same  band.  We 
showed  that  this  lateral  inhibitory  efTecl  is  strictly  monocular 
and  that  it  is  narrowly  tuned  for  spatial  frequency.  The 
possible  implications  for  perceptual  theories  arc  profound. 
On  the  one  hand,  it  appears  that  the  lightness  of  a  point  in 
space  is  a  far  more  complex  function  of  its  environment  than 
had  hitherto  been  suspected— it  will  take  a  great  deal  of  work 
to  elaborate  the  precise  spaliolcmporal  properties  of  the 
textural  interactions  sketched  out  here.  On  the  other  hand,  if 
there  are  such  specific  lateral  connections  between  spatial- 
frequcncy-iuned  neurons  and  thc:r  similarly  tuned  neighbors, 
might  there  not  be  equally  specific  connections  to  nor¬ 
malize  the  responses  of  other  classes  of  neurons?  Is  sclf- 
normalizalion  a  universal  perceptual  pnncipic? 


*54^  4nd  ilovhvtem  also  reported  (hat  a  light  bai  of  a  grating 
adjavent  to  a  zero-vonvtravt  area  appeared  lightei  lhan  other  bai^ 
It  tv  (snvible  to  account  foi  (hiv  effect  m  terms  of  simple  lumtnanvc 
in.eravitonv.  it  does  not  stneily  require  texture  interactions 


Ihc  authors  arc  grateful  to  Barbara  Dovhti.  Michael  Landy.  and 
RobenShapley  fortheirhclpfulcommenis  This  work  was  supported 
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>(iinuius  kontMsi  IS  (he  dependent  variable  in  the  gralinj;  detection  experiment.  Indeed, 
(he  grating  detection  experiment  can  be  viewed  as  indicating  the  effective  power  of 
qu.mtal  plus  sensory  noise  ns  a  function  of  spatial  frequency.  We  say  ‘efTcctiNC  power* 
because  there  is  no  provision  in  the  simple  stage  model  for  input  amplincation  tliat 
may  vary  as  a  function  of  spatial  frequency:  input  gain  is  incorporated  Into  sensory 
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KINETlC  DEPiH  EFFECT  AND  OPTIC  FLOW— I. 

3D  SHAPE  FROM  FOURIER  MOTION 

BaSSaKA  A.  DQBI^*  MiCMtfL  S.  IjiKDr  »d  GiOMZ  SnsUBfil^ 

*i^yhclcfy  Dcfw^waM.  Bbk  ^  SdiMMMWw  ibl.  Cdhwtii  IMtcijiil'.  Mnr  Ywfc.  KV  IOQ27aad 
-rtythdofy  petmmcm.  tS^  Ywk  Va%€tmr.  WtMapm  Ywt.  KV  MOaUSA. 

AImo— Fif^-Aiee  dWam  30  A^es  «CK  AAMd  tw- sqacaos  flf  20  «im  (fitaavs)  of  dots  M  a 
fOMinf  ^  «*tKc.  <l)  SatiBM*  acoi^' of  shape  iiisatiicuieAS  Anfpcd  frea  0«cr  90%  to  kas  ihM 
10%  win  ciihcrihepeiarii>ofAeiti»a2csdBtt  mas  alKiMUd  fremiti  ea  flay 
stcgsrwfcanMorwheatairilgrayiMeilirataBcr^^WycBmipoieAlBilmqwhiioasieiCTfcig 
villi  RMitOB  cur jtiwm  ty  ^aitMeaipofal  {FMrkr)  gndicai  fint-«tdn  deaecien.  Setmd-^Jer 
(>fl»»FottriCT)dcieciorsihaf»efifl»masvieai<caik>Baigt»ScckdtyalwaMii|  polariiytonfanipaal 
by  ieterpMcd  pay  foam.  (3)  To  e^ie  the  aceiaaey  of  tv^ahcnatKR  foncAdboke  (2AFO  rtacar 
ifirectioo^'iDoiioo  dacniB^Uoe  ia  staadard  ttd  iK^anii'ahCTaated  tiawiS.  standaid  CMUrM  mas 
redacnl.  3D  stupe  dgcrinaratioa  sanhtd  coatrast  fcdactioft  m  standard  stii^  mhertas  it  lailrd 
CMDplete*«  «uh  pola£H}>aherBatioe  eten  at  ooctrasL  (3)  Ihliefi  isdhidoal  dots  mere  peramtcd  to 
reia^  in  the  tisap  seqoeacc  for  oalt'  t«o  frames,  performaoee  shamed  htUe  loss  conjured  to  standard 
dt^)Sa)S«here  indilid^  dots  had  an  expected  fifetiw of  30 fraaaes.  sSom»pthat  3D  shape  idccbficatiMi 
doo  not  reqtdre  coeiiatM}  oCstasshts  fofceas.  (4)  Perfcnaaace  ta  aD  discrimiaatioiitaAs  is  prediined 
(ap  to  a  mcaatoae  traf»fon&aiioa)  hy  ccosHemg  the  qoaht}  of  hrsi*otder  lafonaatioc  (as  pren  hj  a 
^p3e  corcpota^  on  Foanct  pcmtr)  and  the  aombcr  ei  loraiioas  at  mbkh  sotioa  mfonsaiioa  is 
r«)inrcd.  Pbcefiual  ^-order  aaaJxtb  of  optic  flo«  is  the  petisar>-  sobsiraie  for  stnicturr'from'raotioa 
'Cocsptna:>ofiftn'n9domdaid0pSa}s  became  only  it  oSers  soSricnt  quab'i>  ofper^val  motion  at  a 
sufieient  number  of  locations. 

Kinetic  depUi  efecl  S^ucf ere  from  i»d&  i  Shape  ideniiScaiion  Fourier  motiao 


INTKODCCTION 

A  sequence  of  2D  projecled  imajes  (frames)  of 
mosinj  3D  object  is  sometimes  perceived  as  a 
moling  3D  shape.  When  each  isolated  2D  frame 
is  uninformative  about  3D  shape,  but  the  se¬ 
quence  causes  a  3D  shape  to  be  perceived,  this 
is  called  the  kinetic  depth  effect,  after  Wallach 
and  O'Connell  (1953).  When  a  computer 
algorithm  recovers  3D  shape  from  a  2D  frame 
sequence,  it  is -called  structure  from  motion 
(Ullman.  1979). 

There  are  two  classes  of  proposed  models  for 
deriving  3D  shape  from  2D  frame  sequences; 
we  designate  them  as  feature ■eorrespomienee 
models  and  flosr -field  models. 

Feature-correspondence  models 
Feature-correspondence  models  use  geo¬ 
metric  constraints,  usually  coupled  with  as¬ 
sumptions  of  rigidity,  to  derive  shape.  Examples 
of  algorithms  that  derive  a  3D  configuration 
from  a  set  of  n  points  (or  similar  features) 
displayed  in  each  of  m  frames  are  Hoffman  and 


Bennett  (1985)  and  Ullman  (1979,  1985).  or  see 
Bra'uristcin,  Hoffman,  Shapiro,  Andersen  and 
Bennett  (1987)  for  a  more  empirical  treatment. 
A  list  of  visual  features  is  identified  arid  located 
in  2D  space  on  each  . frame.  In  this  class  of 
model,  the  correspondence  of  point  n  in  frame 
m  with  equivalent  point  n  in  frame  ni  -i- 1  is 
assumed  to  be  knovy-n.  Using  Euclidean  ge¬ 
ometry  and  the  assumption  of  object  rigidity,  a 
3D  location  for  each  feature  on  each  frame  is 
derived.  The  .set  of  3D  locations  determines 
object  shape 

Floir-ficid  models 

Flow-field  models  derive  object  shape  from 
local  velocity  information  described  by  optic 
flow'  fields.  An  object  is  described  by  many 
points  or  other  features  densely  scattered  on  its 
surface  and  possibly  throughout  its  volume.  The 
flow-field  is  computed  from  the  velocities  of 
groups  of  points  over  a  sequence  of  frames 
Flow-field  velocities  determine  relative  depths 
and  orientations  and  thereby  object  shape  (c.g 
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fWtoM  HaHmaa,  Kntiiilfriil:  A  Molioa  daitiaiuiliim  (fct-  ihe  dBCiiaMa- 
vaa  Doom.  I9S^  Fhm'-tdd  aoddi  gjgpy—iioB  et  ItAwd  iniB  ri^Maid  motion)  non- 
■ihat  a  scqacnGt  oT  fianM  mi^  be  eonadned  *pftm  lo  be  a  dWacni  ]»ocess  than  vdodiy 
noeasanabiaaciiaoflemiitsnMiatwdaiwt  dbcriatination.  The  dabotations  oT.lbe  basic 


location j«foimatioB.-lnn  as  a  motion  nanabir 
to  OK  or  more  motioBHddection  mediamsms. 
In  lias  aitick.  «c  are  primaffly  oonoerned  rriih 
deicnmning  Ibe  namre  of  tins  motion  srinmhis. 

TOST-Onm  AND  SBCOMMNdni  MOTION 
SINTIMS 

We  consider  here  three  kinds  of  motion- 
deicclors:  nro  bist-oider  deiccioc^  nbidi  we 
desipiate  as  (1)  spatio^temixHal  motion  dtergy 
detraors  and  P)  {radieiil  detectors,  and  (3) 
second-order  ,  detectors.  A  ffm-order  detector 
deteas  motion  in  stimuli  that  uould  yield  mo¬ 
tion  to  a  local  spatio-tempdial  Fourier  anaKsis; 
a  sreond-order  detector  may  detect  such  motion 
but  also  delcas  motion  in  a  uide  class  ofstimiili 
that  do  not.yicid  directional  motion  under  any 
kind  ,  of  Fourier  analysis.  We  examim  these 
kinds  of  detectors  in  more  detail  below. 

Fourier  motion-energy  detectors:  the  elaborated 
Retekardt  detector  (ERD) 

Low-lesel  motion  mechanisms  are  now 
thought  to  be  based  on  systems  that  approxi¬ 
mate  a  local  spatio-temporal  Fourier  analysis 
of  frame  sequences  (Adelson  &  Berpen. 
I9S5;  van  Sahten  &  Sperlinp.  1985;  Watson 
&  Ahumada,  I9S3:  Watson,  Ahumada  & 
Farrell.  1986).  Indeed,  whenever  the  spatio- 
temporal  frequency  components  of  a  stimulus 
differ  in  temporal  frequency,  the  output  of  these 
mechanisms  is  simply  the  sum  of  their  responses 
to  the  individual  spatio-temporal  Fourier. com¬ 
ponents  of' the  stimulus  (derived  from  their 
equivalence, to  Reichardt  detectors— van  Santen 
6c  Sperlinp.  l9S4a,b).  The  Reichardt  detector 
(Reichardt,  1957)  was  the  first  computational 
motion  detector.  The  elaboraied  Reichardt  de¬ 
tector  (van  Santen  &  Sperling.  1984a.  b,  1985) 
successfully  extended  the  basic  scheme  to  the 
prediction  of  human  psychophysical  data,  al¬ 
though  there  were  earlier  attempts  (e.g.  Foster, 
1969, 1971).  The  motion  models  of  Watson  and 
Ahumada  (1983)  (when  -elaborated)  and  of 
Adelson  -and  Bergen  (1985)  have  motion- 
detection  mechanisms  that  are  defined  differ¬ 
ently  but  have-been  shown  to  be  equivalent  to 
Reichardt  detectors  at  their  fina!  outputs  (van 
Santen  &  Sperling,  1985),  although  the  order  of 
intermediate  operations  is  different. 


-MotiiM  deleciiow  Mniwnism  to  accoml  for 
kIoc^  dBantiaatiaa  are  quite  conpla  (e^. 
Watm  Jk  Ahmiada,  I9S^  Hetger.  1987)  and 
BRohc  die  bitetplay  of  many  demcntaiy 
inoto  deteetdfs.  Smee  dR  there  models  nlti- 
naiidy.  depend  <m  a  basic  mechanism  that  is 
eqmvalent  to  an  Aboiated  Rekhardt  detector 
(BU>),  we  diall  describe  the  ERD  in  more 
detail. 

A  Reichardt  motion  detector  consists  of  two 
component  half-detectofS.  Ok  half-detector 
cooipaites  the  intensity  at  point>.d,  time  r  with 
the  intensity  at  pmnt  B,  time  i  +  Bt  (sec  Fig.  1). 
The  other,  half-sector  looks  at  (Bj)  and 


ond 

decision  rules 


Fig  I.  A  Khomlic  illuilralion  of  »n  elaborated  Reichardl 
detector  (van  Santeiv  A  Sperling.  19SS).  one  implementation 
Of  a  ipatio-temporal  motion  analyzer.  Image  intensity  at 
location  A  at  time  r  is  correlated  (multiplied)  by  image 
intennty  at  location  Bn  time  r n- dr  (left  half-deieetor) 
Similarly. iriiagfintensityai  location  Rat  time  riscorrelated 
(mulriplied)  by  image  intensity  at  location  .1  at  time  r  -t-  dr 
(righs  balf-dctcctor)  These  correlation  values  are  temporally 
integrated  mer  some  lime  domain  r,  and  compared  (sub- 
traetedfto  yieh?  a  dircaion-of-motion  signal  for  that  detec¬ 
tor.  Onentalion  aic)  vtlodly  tuning  arc  determined  by  the 
selection  of  receptive  trdds  and  Jg  and  dr.  Spatial  scale 
is  determined  by  the  spatial  function  which  senses  image 
intensity.  Outputs  of  populabdos  of  such  detectors  of 
various  scales,  locations,  and  velocity-tuning  must  be  inte¬ 
grated  with  sutoequenl  decision  rules.  FurTker  clabo-ations 
are  required  lo  construct  velocity  sehsurv 
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{A^  cacfa~lalMclcdor  am  dOM’ 

malm  by  iistf,’%  two  togcdia  haw  sou 
imponant  adnsnatgcL  1^,  signa]'  mqiim  iu' 
dppoBle  dnccuois  1^' aligns  of  oppo^  agn, 
a^  by  cai^rimg,  esidera  for  mowemcnt  in 
opposire  dirbctiob^  they  hdp  to  disamb^naie  ' 
IBtto  mi  Oita  aonmolm  slimnE  fiom'  ttiie 
motion. 

To  account  fm  pqdiophysic^/^ta,  tte  spa^ 
tial  poirils  A  mi  B  arc  repbood'wiih'  ^tio- 


spacc.Giadicntmoddsdoasii^loca]compu- 
tatibn  llM  dnbian  both  tk  Rddsardt  motion 
idctcclioii'  nirrhainmi  and  the  shbsequeni  vd- 
oa^  snip  of  the  flohr-lidd  modds. 

Whenever  the  ^tia]  fanmnance'gradient  is 
snu^  ydoaly  estimates  are  enremdy  unsuble. 
Ihaefore,  Adcisdn  and  Bergen  (I9SQ  proposed 
w^lmg'  tire  local  vdodty  csiimales.to'  a 
"confidetn*’  r^ne.  CbomiDg  the  “conhdence" 
ievd  as  tire  locd  vthit  of  the  squared  gradient 
couverts  the  gradient  coit^utalion'into  a  least- 
squares  diimate  of  vdodty  (Lucas  ft  Kanade, 
19)tlX'a  compiiution  that  can  be  carried  out 


temporal  lecqitive  fidds,7j  and and  the  pure 
delay  At  i,  repladd  hith  a  Knear  fiha.-'Tbe 
Tcceptiie  fields  f,  and  /^.detenriitre  the  ^tial 
orientation-tuning  of  tbe  detector,  and  ft 
taken  with  the  thne  delay  A/ jointly  ddermine 
the  vdodty  tuning.  Theories  of  humari  riiption 
paception  which  wc  hare  discussed  assume  that 
populations  of  such  detectors  exist  in  different 
sizes  (scales)  and  at  each  scale  they  are  tuiied  to 
diffcrenl  orieniations  and  relodlies.  The  aggre¬ 
gated  outputs  of  all  Ih^  detectors  are  com- 
binni  by  a  roiiiig  (dedsion)  rule  to  predfa  the 
direction  of  perceired  molidn  at'cach  spatial' 
location  and  time. 

ERDs  (aiid  hence  the  various  equivalent  spa¬ 
tio-temporal  motion-energy  nickels),  account 
for  a  wide  variety  of  critical  data  bn  direction  of 
motion  discrimination  (van  Santen  &  Sperling. 
I9S43.  1985).  To  provide  vdodty  sensing, 
outputs  of  arrays  of  basic  spatio-temporal 
motion  detectors  must  be  combined  (Watson  ft 
Ahumada.  1985:  Hecger,  1987),  because  an  iso¬ 
lated  ERD  will  not  function  adequately, as  a 
velocity  detector.. Stimulus  contrast  and. many 
factors  relating  to  velocity  tuning  are  con¬ 
founded  fn'the  response  b/  any  one.motion 
detector.  Watson  and  Ahumada  (1985)  prbpo.se 
direct  coding  of  the  temporal  frequency  of  sets 
of  motion  detectors.  Hecger  (I987)'compares 
the  overall' pattern  of  responses  of  a  set  of 
motion  detectors  to  an  unknown  stimulus  to  the 
patterns  produced  by  knowu  training  stimuli. 

Gradicnl  detectors 

A  second  class  of  first-order  motion  detection 
mechanisms  uses  gradients  in  the  computation. 
Examples  are  Limb  and  Murphy  (1978), 
.Fenhema  and '  Thompson  (1979),  Horn  and 
Schunk  (1981),  Marr  and  Ullman  (1981),  and 
Harris  (1986).  Basically,  these  models  find  local 
areas  where  luminance  l(x,y,t)  varies  as  a  func¬ 
tion  of  (vj-).  i.e.  has  a  nonzero  spatial  gradient 
Vl(xy,t)^0.  The  velocity  r  is  determined  by 
the  ratio  of  the  change  in  /(.vo  .r)  as  a  function 
of  time  to  the  change  in  l(x.}.t)  as  a  function  of 


by  tire  first-order  motion-energy'/elabotated- 
Reichardt  synons  that-  wn  outlined  above. 
Thus,  while  at  first  glance  griulient  computa¬ 
tions  seem  quite  different  from  Fourier  first- 
order  computations;  the  difference  vanishes 
when  a  ralistic  gradient  computation  is  made 
(Adeisoh  ft  Bergen,  1986). 

Second-order  motion  detection 

Stable  perception  of  direction  of  movement 
and  of  velocity  can  arise  from  complex  stimuli 
which  are  essentially  .itivisible  lb  first-order 
motion  detectors-i-they  fail  to  report  any  con- 
MStenl  direction  (Chubb  ft  Sperling.  1988a,  b). 
Motion  detectors  to  perceive  Chubb  and 
'  Sperling's  motion  stimuli  require  two  stages  of 
linear  filtering  separated  by  a  full-wave  rectifica¬ 
tion  stage  that  computes  the  absolute  value  of 
contrast.  For  the  present  stimuli,  however.  the 
linear  filtering  stages  are  unnecessary  and  will  be 
omiued..Becausc  of  the  necessity  of  a  two-stage 
analysis  (first  rectification  with  or  without  filtcr- 
ing.'lhcn  Reichardt-or-equivalcnt  motion  detec¬ 
tion),  motion  detectors  that  can  detect  such 
stimuli  arc  called  second-order.  Early  evidence 
((Thubb  &  Sperling,  1987)  suggests  that  second- 
order  systems  may  operate  primarily  fovcally 
and  with  lower  spatial  resolution  than  first- 
order  detectors.  Since  they  depend  on  rectifica¬ 
tion,  with  inevitable  loss  of  information,  sec¬ 
ond-order  systems  have  higher  contrast 
thresholds  lhan  first-order  systems  (Chubb  & 
Sperling,  1989a,  b). 

First-order. and  second-order  systems  and  KDE 

This  paper  asks  whether  the  ability  of  humans 
to  perceive  3D  shape  from  a  2D  frame  sequence 
depends  on  the  strength  of  evidence  supplied  to 
first-order  motion  mechanisms.  This  question 
stands  in  sharp  contrast  to  much  of  the  hisiorie 
work  on  kinetic  depth  effect,  which  emphasized 
cues  such  as  perspective  (e  g.  Braunstein.  1962). 
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DoniensityCCRCii.  196I).oroccjiiaoa(AiideF- 
sea  A  Bramisldii.  1983)  and  tbdr  cflect  m  ^ 
1"°^  We.ask  «1k^ 
slnn^  nqiut  to  a'cfiftt-Ofder. motion  is 

'ncccssaiy  to,  suppoft^.rittpc  _|iciicgprion.  Onf 
straiogy  is  to  inuoduoe  lactors  suA  as  flidkCT  or 
coniM  (pobrity)  icmsal  mtakcn'or  ^ 
lupt  a  firsi-oider  motitm  medunism.  We  can 
then  a^  whe^er  the  ..ab3it)vtp  .|ierdei«e  3D 
sha^  is  espe^^  degrade.  SymmetricaDy  ne 
ask.  do  scco^-order  tfyste^  sqiport  3b  shape 
perception? 

In  the  experiments  of  thu  paper.  Kinetic  depth 
dispbys  arc  rendocd  as  ^ts  siatteral.  ran¬ 
domly  m  a  3D  surf^.-'biese  ate  projected,  as 
a  2D  imge  of  bri^i.dots  on  a'neutiU  gray 
back^ound.  Figure  2a  schcnutialty.ilitMmiK 
spatio-temporal  analyris  of  a  moting  intensified 
(brighter)  dot  on  a  gray  background.  A  fraoK 
sequence. defines  the  stimulus  as  a  function  hi 
(Xi-.r).  uhere  x  anid  y  represCTt  locations  in  the 
picture  plane,  and  r  r^re^is  frames  (time). 
Figure  2  simplifies  the  analysis  by  shouing  only 
the  (xj)  plane.  A  line  in  the  (x.f)  plane  repre¬ 
sents  the  jf -component  of  selociiy.  A  spatio- 
temporal  recepthc  field  here  tuned  to  precisely 
the  selocity  of  the  illustrated  points  is  ^core 
component  of  one  representalioiial  form  of 
the  Fourier  energy  motion  detectors  (Adelson 
&  Bergen.  1985:  Watson  &  Aliumada.  1984; 
and  by  equiralence,  the  ERD.  ran  Saiiten  & 
Sperling.  l9S4a,  b). 


F«nre'2b  iDostrates  a  manipnlation  nhidi 
intersperses  gray  fiames  betmen  motion  sam¬ 
ple^  bm  mainttms  tlie  sam  yd^y.  This 
rednem  the  anqifonde  of  the  funilamental  mo¬ 
tion  cqnyoiCTt  •  by'  h^  ^  imrodiiccs  many 
.kw-ampEtnde.  moticn  exmponenis  opposite  in 
Erection  to  dm  fhndainataL  One  such  opporite 
/Erection  detector  is  iOu^ted  in  Fig.  2b.  An 
aherriating  gray  frame  dispby  is  eqinvalent  to  a 
h^-save'tectttcation  of  a  polarity  ahenution 
sihnhhk  (sa  hdow).  For  onr  gray-frame  stim- 
'nK,  the  total  Fourier  enerQ'  in  each  direction  is 
approximatdy  equal.  If  the  sensitivities  to  the 
various  qsatio-temporal  motioh  components 
-were  eqiuE'die  energy  in  each  direction  would 
balance  and  neutralize  tlw  Fourier  qrstem. 
Empiriially,  at  consunt  yclority,  reducing  the 
numbCT  of  samples  (as  in  a  gray  frame  versus  a 
standard  motion  stimulus)  aluays'impaiis  the 
perceived  quality  of  strobosco^  motion 
(Sperling,  1976).  Reducing  blank  (background 
level)  intcrstimulus  intervals  to  about  30mscc 
(and  hence  varying  velocity)  improves  planar 
apparent  motion  betvvccn  two  alternating 
frames  of  random  dots  (Braddick,  1973.  1974) 
or  multi-frame  sequences  (Burt  &  Sperling, 
1981). 

Figure  2c  illustrates  a  motion  stimulus  which 
alternates  polarity  of  the  motion  token  between 
intensities  higher  and  lower  than  the  neutral 
(mean)  gray  level.  Polarity  alternation  provides 
cancelling  inputs  to  local  spatio-temporal  filters 


(  0  )  0^  5*5/ 


Fip  2.  (a)  SchemaUc  illujtraiion  of  a  simple  spaiio-iemporal  sensor  operaimp  on  a  mo\inp  while  doi  on 
a  graj  bacigroundeOne  dimension  of  space  jr.  and  iirne  #  are  rcHesemed.'The  eenier  (solid  ellipse)  has 
a  weighi  of  +  Ij  each  of  ihe  flanVs  (doiied  ellipse)  has  a  uaghl  of  - J.  The  geo'meiry  and  orientation  of 
the  hjpothetkal  recepthe  field  represent  the  preference  for  a  particular  spatial  scale,  direction,  and 
xelocitj.  (b)  Same  sensor  as  (a)  operating  on  a  stimulus  with  interleaved  gray  frames,  and  a  second  sensor 
sensitive  to  the  opposite  velocii).  The  mapiiude  of  the  stimulation  of  the  center  of  sensor  I  equals  the 
combined  magnitude  of  the  stimulation  of  the  two  flanls  of  sensor  2.  At  this  scale,  there  is  equal  evidence 
for  both  orientations,  i  e .  both  velocities  (c)  Same  sensor  as  (a)  operating  on  a  stimulus  with  loVens 
aliernatingpolariiyabovcand  below  thegrav  baclground  level  Sensor  I  receivesoppositel)  signed  inputs 
in  ns  center  and  has  a  weaV  cuipul  Sensor  2  reenves  inputs  in  its  surround  opposite  in  sign  from  those 
in  Its  center  and  therefore  has  a  large  output-  Alternating  polarii)  >ie:ds  strong  evidence  for  oncntation 
from  upper  right  to  lower  kft.  ie  for  motion  e^positc  to  the  direction  in  (a) 
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tuned  to  the  “veridical”  motion  diiertidn;  alter-,  .Proetiat 

luti^  as  mustmte^  i^ulat«iaige-scale  ^  Displays  were  seen  through  a  viewing  tunnel 
tenors  tuned  to  the  opposite  direction  (Anstis,  aod'oreidar  apmure,  vriiich' provided  monocu- 
1970;  AnstisA  Rogers.  1975;  Chubb &SperHng,  viewing  at  a  viewing  distance  of  1.6  m.  The 
1988b;  Rogm  &  Amtis.  1975).- Like  the  spatio-  rircular  aperture  to  sb'ghtly.  larger  than  the 
teinporal  energy  models,  the  g^ient  methods,  dis|^j,s..The  intensity,  timing  and  content 
which  examine  duhges  in  luminance  patterns  ^  ^  display^  frame  sequences  are  listed 
over  time, -ate  also  disrupted  by  polarity  for  each  experimait  separately.  Follow- 


reverml. 

We  investigate  interspersed  gray,  frames  and 
polarity  reversal  (and  other  manipulations;  see 
Lahdy,  Dosher,  Sperling  8:  Perkins,  1988)  that 
may  disrupt  first-order  processes.  We  determine 
whether  3D  shape  extraction  is  disrupted.  It  is 
also  important  to  determine  whether  any  such 
disruption  is  special  to  3D  shape  extraction 
processes,  or  whether  it  can  be  accounted  for 
exactly  by  decrements  in  simpler  2D  visibility 
and  motion  tasks: 

The  objective  nieo<ure  of  3D  shape  reeoi  ery 
The  essence  of  kinetic  depth  perception  is  the 
addition  of  depth  informatidh  to  a  2D  image  to 
create  a  perception  of  a3D  object  shape.  Weask 
whether  kinetic  depth  percepts  depend  on  first- 
order  ntqiion  analysis.  In  order. to  have  more 
than  a  qualitative  answer.io  this  question,  it  was 
first  necessary  to  develop  an  objective  index  of 
3D  shape  perception.  To  this  end.  we  (Sperling, 
Landy,  Dosher  &  Perkins.  1989)  developed  a 
shape  identification  task  with  a  very  low  guess¬ 
ing  baserate  (near  2%)  and  a  large  performance 
range  (up  to  95-k%).  This  task  requires  sub¬ 
jects  to  identify  a  display  as  depicting  one  of  a- 
large  lexicon  (53)  ofUhrcc-dintensional  (3D) 
surface  shapes.  In  this  paper,  we  also  use  com.- 
parison  tasks  such  as  detection,  direction  dis¬ 
crimination  and  motion  segmentation  in  several 
control  studies.* 

.G!2VERA1,  .VICTHODS. 

Apparatus 

'Stimuli  were  pre-generaicd  and  stored  on  a 
Va\  11/750  compuierthat  shipped  images  to  an 
Adage  RbS-3000  image  display  system.  A 
Conrac  721ICI9  RGB  color  monitor  was  used 
for  display,  operating  at  a  refresh  rate  of  60  Hz, 
noninterlaced.  Only  the  green  beam  of  the 
monitor  was  used. 


•Preliminary  reports  of  these  experiments  are  contained  in 
Lands,  Sperlinp.  Dosher  and  Perkins  (l9Si7y.  Landy. 
Spcrlm;.  Perlirs  and  Dosher  (I9h7)  and  Dosher.  Und} 
and  Sperhnp  (ISfhr 


each  display.  $equence,-the  subject  pressed 
keys  or  typed  the  required  judgement.  The 
primary  task  was  shape  identification.  Control 
tasks  include 'standard  two-interval  detection, 
dircction-of-motion  discrimiiution,  and  morion 
segmenution.  Displays  were  viewed  in  mixed 
b'sts  within  experiments. 

The  methods  sections  for  Expts  1-6  are 
presented  together  below,  in  the  order  in 
which  the  results  will  be  discussed.  This  allows 
an  uninterrupted  presentation  of  the  argu- 
mcnis  in  the"  R«ults  section,  where  motivation 
for  the  particular- conditions  and  experiments 
can  be  found.  The.  experiments  were  actually 
run  in  the  following  order;  1,  3,  5,  2,  6 
then  4. 

The  displays,  or  conditions,  for  Expts  1-3— 
the  3D  shape  identifiration  experiments— ate 
•summarized  in  Table  1.  The  displays,  or  condi¬ 
tions.  for  Expts  4-6— planar  motion  experi¬ 
ments— arc -summarized  in  Table  2.  Distinct 
-display  types  are  numbered  continuously  in  the 
two  tables. 

METHOD:  EXPERIVtENT  I  (MAIN) 
Identification  stimuli 

The  main  experiment  compared  objective  per¬ 
formance  levels  on  standard  kinetic  depth  dis¬ 
plays  with  performance  on  ciimparable  displays 
that  disturb  or  weaken  first-order  motion  cues. 
The  objective  measure  was  percent  correct  iden¬ 
tification.  the  shape  lexicon  was  based  on 
peaks,  valleys,  and  flat  regions  located  in  one 
of  two  triangular  layouts.  Figure  3a  shows  the 
two  triangular  layouts  on  a  square  ground,  and 
Fig.  3b  shows  some  examples  of  shapes  Fig  3c 
illustrates  a  shape  movement,  and  Fig.  3d  indi¬ 
cates  the  size  of  a  single  display  frame.  Stimulus 
identification  consisted  of  reporting  the  layout 
(Up  vs  Down),  the  sign  of  the  bump  (-f  =  peak, 
0  =  flat,  -  x=  valley)  in  each  of  locations  1,  2, 
and  3,  and  the  direction  of  rotation.  (Sec  Sper¬ 
ling  el  al.,  1989,  for  details.) 

For  the  3D  shape  identification  task,  feed¬ 
back  consisted  of  a  list  of  the  correct  responses. 
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Table  1.  Dii^  typel  for  EipB  1-3 

Task:  bffe  kncoo  shape  identification' 

vDi^y 

Motion' 

cue* 

Dcttsily 

Rotation 

speaf 

lniei^y± 

sDcrenmu* 

.  Dot 
bfetime' 

Ex^imeni  I' 

•  V  - 

*  -  - 

. . -  •'  • 

X  -  -  • 

(Main) 

L.UItk  density'  -  ■'« 

3D. 

Y 

Standard 

1:1, 

*30 

2.  Standard 

3D 

N 

Standard 

.1:1 

S30 

3.’  Iklib  densnv' 

3D 

Y 

Hair 

1:1 

30 

4,  Standard 

3D 

N 

Half 

1:1 

S30- 

5.  Ahematjof  pobrity 

3D 

N 

Standard. 

l;-l* 

-S30 

6.  Ahenutmi  pobrit) 

3D 

K 

Standard 

0J.-0.5 

^30 

7.'Altematnif  pay 

3D 

N 

Hair 

1:0 

ssq 

Sw'Ahematinf  pay' 

3D 

K' 

Standard' 

'  1:0 

S30 

9.  Ahematiof  contrast 

3D 

K 

Standard 

.2:1‘ 

s:30' 

lOw  Ahemattfif  contrast 

3D 

K 

SUndard 

1.5:0.5 

S30 

11.  Denuty  mly 

Random 

Y 

Sundard 

1:1 

r 

JExpcrbttnf  2 

' 

(Equated  contrast) 

*12.  Sundard 

30 

N 

Sundard 

V:V 

S.VO: 

Experimm 

(Lifetimes) 

2.  Standard 

3D 

K 

Sundard 

1:1 

S3tl 

I3,*3-Franic 

3D 

K 

Standard 

1:1 

3 

14^  2*Ffame  , 

-3Da 

N 

Sundard 

1:1 

2 

'3D  molion  cues  nftti'lo  2D  projeaiont  of  3D,nu>«in£  slimuli.  Random  icfera  lo  random  motion 
corrcipondoncM  ariripf  from  unoorrclalcd  no«' doi  sampler  on  each  frame< 

*Dol-der.sity  cues  rcmored  by  minimal  (<S%)  dot  scinlitlaimn. 

'Standard  rotation  sp^t  nas  35  do;  sinusoidal  rotation  per  30  ntH^rames.  1 5  n^  frames  ^t  see  urth 
4  syne  cycles  nc«  frame.  Half  rotation  speed:  ±2Sde£  sinusoidal  ratation  per  30' new  frames:  7,5 
new  frames  pa  see  utth  $  syne  cycles  per  new  frame  (ronditibns  3.*  4)  or  15  new  frames  per  sec  with 
4  syne.cyeles  per  new'frame  (condition  7)  (see’ teal). 

*The  numbers  et^e  the  inacmenty  or  deaemrats  in  intensification  of  dots  on  a  neutra’i  fray  baclsround 
I  refers  to  1  Vthe  standard  t'naement  level; and'- 1  refers  to  I  x  the  standard  deaeinent  level.  The 
value  to  the  left  of  the  colon  refers  to  dot  intensification  on  odd  frames:  the  value  to  the  riphi  to  even 
frames.  For  evair.ple.  1:1  means  dots  received  the  same  standard  increments  on  all  frames:  1:0  meant 
dolt  received  standard  intensification  on  odd  frames,  and  no  intensification  oh  even  fta'mes:  etc.  Gray 
baclporouhd  was  between  31  and  38  cd  mV  Standard  ir.cretnenis  (and  deaementt)  were  between  13 
and  3l.cxtra(or.fewer)ped  per  dot.  See  the  text  for  exact  values  for  cachstibjea.  The  value  V  refers 
lo  fraction  (<l)  of  standard  inaement  intensity  which  equates  nomallcrhalinp  stimuli  to  aliernaiinp 
polarity  stimuli  for  percent  correct  planar  motion  direction  judpements  (see  Expl  5).  Intensities  for 
V  were  between  approximately 0.5.0 6.  or  between  8  and  lOped  per  dot. 

'Lifetime  refers  lo  the  number  of  new  frames' that  toe  same'dols  on  the  3D  surfa'ce  appear  in  during  the 
stimulus  sequence., Since  the  display  sequences  .were  30  new  frames  long,  a  lifetime  of  30  frames  is 
maxima).  The  value  S3(l  refers  lo  nomiital  lifetime' of  30  frames,  subycci  to  santillation  for  density 
control.  Conditions  ( 1 3)^and  ( 14)  resample  one  third  and  one  half  of  the  dots  in  the  stimulus  per  frame, 
respectively,  yielding  scinfillation  values  of  33%'  and  50%, 


For  any  slimulus,  there  xvere  iwo.correci  re¬ 
sponses.  which  are  depih-reversals  .of  one  an- 
other;  the  depth  reversals  are  coupled  xxilh 
opposite  perceived  direciions.of  rotation.  Sub¬ 
jects  were  initially  shown  perspecit've  drawings 
of  shapes  and  instructed  in  naming  perfor¬ 
mance.  Subjects  were  trained  in  practice  sessions 
until  they  achieved  approximately  85%  correct 
on, the  easiest  stimuli. 

.The  standard  kinetic  depth  display  consisted 
of  .white  dols-on  a  mid-intensity  (gray)  back¬ 
ground.  The  displays  were  300  dot  random 
subsamples  of  the  picture  plane,  displayed  with 


'The  number  of  dot)  acluylly  varied  slightly  from  300  due 
to  sampling  of  dots  at  or  ricar  the  windowed  edges 


an  .v.y  resoluiion  of  182  x  182  pixels.*  Projec¬ 
tions  were  parallel.  Peaks  or  valleys'  had  simu¬ 
lated  height  equal  to  half  the  side  of  the  square 
ground.- The  smooth  surface  was  constructed  by 
smoothing  of  a  spline  interpolation  over  the 
stimulus  peaks  and  the  ground.  The  surface  was 
-initially  parallel-  to  the  projection  plane,  and 
rotated  first  right  (or  left)  25  deg,  back  through 
to  left  (or  right)  by  25  deg,  and  then  back 
full-forw-ard  (25  deg  amplitude  sinusoidal  rota¬ 
tion)  over  a  period  of  -30  new  image  frames. 
Stimulus  edges  never  appeared  in  the  display 
window.  The  displays  assumed  no  occlusion  of 
dots  by  the  3D  surface  (transparency).  The 
standard  display  rale  was  15  new  frames  (with 
changed  frame  conlenis)  per  second.  Each  new 
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table  2.-  Pisphy  t>^  for  Expo  4-6 


Planar  mown  experiments 

DIsoUv 

Motion 

cue* 

Number  of 
pateJics*- 

Motion 

direction 

lntenrity;f 

increiRents' 

Tast 

Sxfrrimfrii  4 

' 

(V'isibilil)) 

I5<I9.  Standard 

1 

LorR 

5  levels 

Detection 

2(b'24,' Alternating  polariiv- 

2D 

1 

L'orR 

+5  levels 

(2IFC) 

Detection 

Expfrpwnt  S 

!■ 

(2IFC)  ' 

(Motion  direction) 

2S>29.*'' Standard 

2D 

1 

'L  or  R 

5  levels 

Directioii 

30>34.  Aliematini  polariiv 

2D 

I 

lorR 

±5  levels 

(2AFQ 

Direction 

(2AFC) 

Expffimcni  6 
(Motion  segmentation) 

35<.Standard 

2D 

9 

SL'IR 

1:1 

Odd  motion 

36.  AfterFailng  polarltv'' 

2D 

9 

or  IL'SR 
8UTR 

l;-l 

(9AFC) 

Odd  motion 

or  IL'SR 

(9AFC) 

motioh  cue  refers  io  umfonn  field  motion  of  a  random  dot  ^Id  In  a  larger  baeL|round  of  neutral  guy 
or  of  dynamic  random  dot  nols^  Planar  motion  was  I  i^fper  neu  frame,  15  neu'  frames  per  sec.  or 
4  svitc^cjcles  pcf‘ new  frame.  See  tot  for  deialts. 

^Patches  were  4$  x’  45  pUcIS'  Sirifle  patches  were  embedded  ina  kTr^r  Vaelground.  The  9*paich  di$p!a}  s  were 
arranged  In  a  3  x  3  square  grtd< 

<DotS'were  displaveJ  as  Increments  or  decrements  on  a  gray  bacigroundrThe  intensitie<  were  \ancd  as 
^reent'ag^  of  the'sta'ndard  increments  and  decrements,  which  art  labeled  as  in  Table  I.  Variable  intensity 
increments  differed  across  subjects  (sec  te\il< 

frame  was  shown  for  4  sync  cycles,  at  a  monitor 
sync,  rate  of  60  Hz,  Half  speed  displays  either 
showed  new  frames  every  8  sync  cycles,  or  at  4 
syne  cycles  with  inlerleavedgray.franies.  In  ihe- 
data  of  Sperling  ei  al.  (I9S9).  a  similar  white- 
oniblack  display  condition  yielded  identification 
performance  in  the  95%  range.  Other  condi¬ 
tions  modified  this  standard  display. 

Display  gconiciry  and  liming 
The  .^D  shape  display  was  confined,  to  the 
.central  I82.x  l82. piycls  of  a  512x512  raster 


•The  hneariejllon  of  ihc  monitoi  depeiljc4  o’n  ihc  average 
inieniiliealion  level. 'To  eqilaie  light  and  daik  dels 
required  cahbralion  oh  ihe  same  gray*level,  and  with 
display  condiiions  as  closely  relaied  lo  ihe  aciual  dis- 
/  play  s  as  poisible.  A  regular  gnd  of  one  in  nine  {nxels  was 
nominally  asiigned  ihe  daik  iniensiiy  and  ihe  remaining 
pivels  assigned  ih*e  gray^  background  level  The  decre- 
meiii  (in  cd  mi)  rclaiive  lo  a  uniform  field  of  background 
intensily  was  equaled  lo  ihe  inihemcnl  when  one  in  nine 
pixels  were  assigned  ihc  hghi  iniensiiy  on  a  gray  baek- 
giound  level.  One, in  nine  pixels  is  an  approximaiioix  to 
Ihc  sparse,  display’s  of  ihe-aeiual  sumuh,  while  still 
providing  stable  'measuremenis  with  an  UDT-16ICRT 
phoio'meier.  The  incieiheni  in  iniensifieaiion  diic  lo  each 
stimulus  doi  fin  /rod  doi)  was  compuied  from  the  field 
.ineremeni.  Although' a  siimulus  doi  is  nominally  one 
pixel,  our  calibrations  show  ihai  inier.s.f:eauon  affects 
neighboring  pixels  via  the  point  spread  funciion  of  the 
monitor  and  phosphor  nonlinearilies 


(60  Hz,  no  interlace).  Background  luminance 
seas  uniform  over  the  entire  512x512  area.  The 
182  x.i82  display  area  subtended  3  7  by  4.2  deg 
at  a  viewing  distance  of  1.6  m  that  was  con¬ 
trolled  by  viewing  lube  and  aperture.  On  each 
trial,  a  fisaiion  spol  appeared  for  I  sec.  followed 
by  I  sec  ofblank  (gray)  screen,  then  the  rotating 
stiniulus  for  2  sec  (4  sec  for  half-speed  displays). 
The  screen  was  blank  until  the  next  trial  was 
initialed.  Responses  yvcre.iypcd  into  a  separate 
keytord,  and  feedback  (correct  stimulus  iden- 
lificalion)  appeared  on  a  separate  CRT. 

Calibrated  mieiisiiics 

The  display  monitor  was  calibrated  to  equate 
the  light  and  dark  dots  on  the  giay  bkkground, 
i.e.  the  luminance  energy  gain  of  increments 
and  Ihe  luminance  energy  loss  of  decrements  * 
Three  subjects  participated;  For  subject  MSL, 
the  standard  intensity'  condition  consisted  of 
background  luminance  of  31.8  cd/m-  (average 
of  Il.6pcd/pixel)  with  1312 ped  additional  (or 
lovvered)  intensification  for  each  stimulus  doi 
(at  viewing  disinace  of  1,6  m).  For  subject  CFS, 
the  background  was  3 1.0 cd/m-  (average  of 
ll.3p^/pixel)  with  increments  or  decrements 
of  l3,2;icd'doi.  For  subject  JBL,  the  back¬ 
ground  was  38.8  cdlm’  (average  of  14.2  ped/ 
pixel)  with  increments  or  decrements  of 
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(a)  Illustration  ol*  th(  up'i^atd  and  do«T>«ard  pointing  trianiular  la>oui  of  pcaV  and  Nallc>  Ixations 
in  the  shape  lexWn.  Mmbers  of  thVieMcon  may  luxe  either  the  upward  or  dow-nward  Uvout.  and  either 
a  peaV;xa11ex.or  ground  xalueat  eachof  thethwl^tions/fb)  Examples  of  a  nurnber  of  shapes  in  the 
shape  lexicon  as  defined  b>  a  rectangular  gnd  spline  oxer  peals.and  xalle>s.  Actual  stimuli  consisted  of 
(parallel  projections'of  dots' sprinkledl  oxer  these  ahapes  underfoing  sinusoidal  rotation.  Subjects  xxerc 
required  to  identify  the  shape  and  the  direction'of  rotation,  (c)  Schematic  illustration  of  the  shape 
identification  displa>s  with  rotation,  (d)  A  urigle  frame  of  a  2D  image  sequence  for  the  shape  identification 

usk. 


20.9  ;icd/dot.  ^'oie:  subject  CFS  could  not  be 
refracted  completel)' to  normal  vision;  his  cor¬ 
rected  Snellen  acuity  was  approximately  20/40. 
All  other  subject's  had  normal  or  corrected-to- 
normal  vision. 

Conditions 

The  main  experiment .  included  II  display 
conditions.  Each  of  the  54  possible  shape  stimuli 
appeared  once  in  tach  of  ihe.l  I  cpnditions,  for 
59^  identification  trials  per  subject.  All  of  these 
stimuli  .Mere  shown  in  one.  large  mixed  list, 
divided  over  4, sessions. 

The  reletani  characteristics  of  the  11  display 
conditions  are.lisied  in  Table  I.  All  displays  in 
this  cxperiinent,  except  condition  (11),  depict 
the  motion  of.  3D  shapes  in  2D  projection.  An 
unconstrained  subsampling  of  points  on  lhe'3D 
shapes,  includes  density  cues  that  result  when 


peaks  and  valleys  cause  dots  to  bunch  together 
in  the  projection  of  the  3D  surface  onto  the  2D 
image  plane.  Except  in  displays  in  conditions 
(1),  (3).  and  (13),  subsampling  of  dots  was 
constrained  such  that  local  density  was  constant 
across  the  display.  Density  cues  were  eliminated 
from  the  image  sequences  by  adding  or  subtract¬ 
ing  a  small  number  of  points  on  each  frame  so 
as  to  equate  dot  density  within. local  regions 
comprising  approximately.  1/10  x  I/IOth  of  the 
stimulus  area.  Constant-density  subsamplmg 
introduced  minor  levels  of  , apparent  scintilla¬ 
tion.  Theiamount  of  scintillation  can  be  ex¬ 
pressed  as  the  average  percentage  of  dots  not 
maintained'from  frame  in  to"  frame  »i  +  1,  ot 
equivalently,  in  the  expected  lifetime  of  dots 
Over  all  the  density-controlled  (no  density  cue) 
displays  in  the  experiment,  the  average  scintilla¬ 
tion  was  5%,  yielding  an  expected  dot  lifetime 
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of  20  fratn«  for  tlK.'ddb  of'frme  I. .(These 
displays  are-indirated  u  -SM' in  -  Tal^ -h) ' 
Condition  (l!).,extrans'  the  Iral.tosity.'ci^ 
in  (I);  but  elimi^tes  ^slmauc  motion  infor¬ 
mation:  Time-  and  pdsition^cpendent'  dennty 
iS'graerated  by -random  sampling  fr^  tfe 
rotating' 3D  sha']^  with  a  n^  ’raiidom  sample 
for  »cK  frame ,(dot''lifetime  of  rfram'e);  This 
destroys  systraatic  motion  cues,' but  nmin- 
tains/’Ioraj  variations  in  dot'  density  un^er 
rotation. 

Most  displays  depict  a  standard  rotation 
spe^  as  descried  al^vc.Tn  con’ditiohs  3  and  4, 
half-spnd  rotation  is  produced  by  displa^ng 
each  new  frame  for  8  (syiic)  repetitions  (instead 
of  4  in  the  standard  condition).  The  half-speed’ 
gray  frame  conditoii  (7)  is  accomplished  by 
interleaving  4  repetitions  of  each  new  frame 
with'4  repetitions  of  gray  frame.'  Full-speed  gray 
frame  condition  (8)  is  accomplished  bydntcr- 
leaxing  4  repetitions  of eieryothernew  frame  of. 
the  standard  stimulus  with  4  repetitions  of  gray 
frame. 

Standard  displays  depict  the '30  shapes  by 
displaying  bright  dots,  of  a  selected  standard 
intensity  of  increment  on  a  neutral  (gray)  back¬ 
ground.  Intensity  listings  in  Table  t  refer-to  a 
multiple  of  the  standard  dot  i'ntensificatioii, 
positive  for  increments  and  negative, for  decre- 
ment$.-fn  alternating  polarity  displays,  the  dots 
are  bright  in- odd  frames,  and  dark  on  even 
frames  (labelled  l:-.l),  In  alternating  gray 
displays,  gray  baekground  is  displayed- on  ail 
even  frames  (labelled  1:0).  ptber  non-standard 
iherements'serve  as  controls. 

.METHOD:  E.VPERIMENT  2  (EQUATED  CONTRAST 
IDE.NTinCATION) 

Conditions 

The  task  in  this  experiment  was  3D  shape 
identification;  it  was  conducted  with  displays 
that-  had  been  equated  for,  discrimination  of 
motion  direction  by  reducing  dot  intensity  by 
an  amount  determined  from  Expt  5.  Subjects 
viewed,  smm/arrf  3D  shape  identification  dis¬ 
plays— Table  1,  condition  (2)-7in  which  the  dot 
increments  had  been  reduc^  ^condition  12). 
The  data  Tor  the  standard  ^non-alteriiating 
condition)  in  Expt  5.  by  interpolation,  allowed 
the  selectiqh  of  an  increment  intensity  which 
would  approximately  equate  the  percent  correct 
mbtio^nj  .direction  judgement  of  the  standard 
conditiiin  with  polarity  alternation  stimuli  at 
full  infensit)  increments  and  decrements.  This 


equaMirerrion-dii^nunaUon'value  w^  deter- 
ini^  s^ratiely  fer  nch'of  the  two  subjects. 

(>f  the,54  'identificatiph,:stiimili  vvas  pre- 
^ted  in  rahdoin  order. 

Display  gedmetry  'and  calibrated  intensities 

View-irig  conditfonsjtwre  the  sathe  as  those 
descriM  in  Method  uperimeht  I;  Calibrated . 
intensiu'es  were;  for  MSL,  The  background 
iijtensity  was  31.0  cd/m*'  with  increment/ 
deerment  intensity  of  8.8  ped/dptrFor  JBL,  the 
rack^qiind  intensity  was  38.0  ol/m’  and  incre¬ 
ment  intmsity  was  9.6'pcd/dot. ' 

METHOD;  EXPERIMbiT  3  (LllimMES) 
Conditions 

This  experiment  compared  three  conditions  in 
which  the  lifetimes  of  the  dots'were  2  frames,  3 
frames -and  ^30  frames  (continuous)  (con¬ 
ditions  14,'I3  and  2,  respectively,  under  Expt  3 
in -Table  ]).'. See  Fig.  6a -for -an  illustration. 
New  dots -.were  subsampled  randomly,  with 
additional  subsampling-'  to .  eliminate  density 
cues  for  all  conditions  of  this  experiment.  The 
-task  was  3D  shape  identification.  Each  of  the  54 
shapes  appeared  once  in  each  condition,  for  162 
identification-responses  per  subject. 

In  the  2-frame  displays,  each  subsampled  dot 
appears  for  exactly  2  consecutive  new  frames. 
Half  of- the  dots- are  replaced  with  another, 
random  subsample- on- each  new  frame.  This 
introduces  50%  scintillation  (density  control 
docs  not  require  additional  subsampling).  In  the 
3-frame  displays,  each  dot  appears  for  exactly  3 
consecutive  new  frames.  One-third  of  the  dots 
are  replaced  with  another  random  subsampic  on 
each  new  frame,  for  33%  scintillation.  In  the 
S30-framc  displays,  each  dot  remains  visible 
for  all  30  hew  frames  of  the  display,  with 
exceptions  to  eliminate  the  density-cues,  which 
introduced  5%  scintillation.  This  is  identical  to 
condition  (2)  of  Expt  I. 

Display  geometry  and  calibrated  intensities 

The  identification  stimuli,  subjects,  and  view¬ 
ing  conditions  are  identical' to  those  listed  in 
Method  Expt  I,  Calibrated  intensities  were 
identical  to  those  in  that  experiment. 

METHOD;  EXPERIMENT  4  (VISIBILITV) 
Conditions 

Conditions  for  Expts  4-6  arc  listed  in  Table  2 
This  experiment  required  subjects  to  detect  the 


prcxacc  of  —jfaiwi  riaw  ■oina  s  a  tme- 
iBtcrvdlotatf<&aDe(2IFC}paadtai(Fi(.1!*). 

The  satjca  adoiel  mliEii  Mcnal  coBMiHd 
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Ik  faKkpoairf  (coadaioat  »-24^  Tk  Cw 
coadilioBiiifcachqpeJitfinarral’aMiaiak 
tbctlioD  al  Sve  kn^  af  ik  ~suadafd~  (caa- 
diliaa  2.  Bqa  1)  dot  iaMaky  (jamfait  or 
dtooaeBB).  For  MSL  tk  jatraiir"  oadjaeat 
TOC  17%.2S%.33%.42%aad5.  .oTjua- 
daid.  Tor  JBL,  Ik  iaica^  ceadHiiiar  aoc 
33*/..  30%.  67%.  S3%  aad  100%  oTsuadaid. . 
Tk  lOcoadhioBieaciiBacfeiicd^OaanycT 
Mock  m  laadom  order,  far  S  Uod^  or  a  lotal 
of  1000  trials  per  sotycci. 
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Diipler  geemarr  osd  adoretti  mueshiet 

Each  hlenal  of  Ik  di^y  cocrisud  of  a 
Jsec  fiMlion  spot.  Jsec  blank  sereec,  foQoaed 
IMIsec  (Idfranies  at  ISfiaaies.'scc)  of 
stimulus.  Non-roorioo  inienals  dispbted  uui- 
fotm  fray  fields.  Motion  intcnals  dsspbted  a 
sequence  of  approunutel}- 17  raadoai  dou  in 
a  -tSxdS  pud  (0.97  ty  1.1  dey)  patdi 
(0.0075 dots/fuseL-or  l6dois,'dtp  aterap  deo- 
st>)  morinc  left  or  rifbt  by  1  pitd'frame.  or 
approximately  0  JS  dec  tec.  The  xieniac  condi¬ 
tions  ucre  identical  to  those  described  abate 
for  Experiment'  1.  For  MSL,  the  backcrotxd 
intensity  «as  3l.0cd.'tn%  and  tk  intenrily 
increment  or  decrement  «as  lUjtpd.'doi  at 
lOO^l  standard-  intensity.  For  JBL'ik  back- 
f round  was  32.0cd/nr  and  the  increment  or 
decrement  was  l6.9pcd.‘L0t  at  100%  standard 
intensity. 

METHOD;  E.\rEW\lENr  5  (MOTION' DIRECTION) 
'■Cfindilions 

The  task  in  this  experiment  was  discrimina¬ 
tion  of  Icfiuard  from  rightward  motion  of  dots 
within  a  square  in  the  center  of.  a  larger  field 
(Fig.  8a).  The  stimuli  were  a  unifonn  field  of 
dots  of  approximately  the  same  density  as 
the  shape  identification  stimuli  of  Expts  1-3. 
The  drift  speed  of  dots  in  the  central  square 
(0.35  deg  sec)  was  approximately  the  aserage  of 
ground  dots  at  the  edges  of  the  shape  identifica¬ 
tion  stimulus,  or  approximately  one-eighth  of 
the  peal:  xclocity  in  that  stimulus.  In  the  3D 
shape  ideritification  stimuli,  peak  speed  is 


flsyirr  gttBmrjPKj  CBSeseo#  newotriw 
EaA  trial  coesisced  of  a  I  cce  spoc.  yset  bbnk 
gray  fiances  and  I  sec  soctae  rEspihy.  ftAowed 
by  a  bbisk  fmae  dsrieg  ikcespocse  tstervaL 
The  usage  was  200  X  2(0  pixels.  4.1  by-(.6<3eg 
at  a  xicwisg  disaaee  1.6m.  THs  xnclsdeda 
dynassc  ao^  backeoemd.  wxA  a  moxs^  ces- 
tcT.of  4S  X  48  puds.  Dot  desrity  was  appeoxi- 
inatcly  Ifidos.'deg^.  and  drift  xelocily  was 
I  pixel, frame,  or  approximately  TJsnnarc 
frame,  or  035dcg'se&  Tk  xiewtsg  condilioss 
aad  calibiatcd  siccdsrJ  ia'carities  are  the  same, 
as  those  in  Method  Expi  1. 


METHOD.  CXrERIXIENT  t  (MOTION 
SECXIENTATION) 

Coalitions 

The  task  in  this  experiment  was  motion  seg¬ 
mentation.  Eaeh  dispbx  consisted  of  a  3  x  3 
grid  of  patches  of  planar  motion,  with  eight 
palcl'cs  drifting  left  (in  a  left-drifting  surroundl 
and', 'one  patch  drifting  right,  or  xice  xetsa 
(Figi  9a).  The  subject's  task  was  to  name  the 
location  and  direction  of  the  odd  motion. 

There  were  two  conditions  in  this  experiment' 
bright  dots  of  standard  intensity  (35).  and  dots 
of  alternating  polarity  (36)  on  a  gray  ground. 

For  JBL.  all  conditions  weri  intermixed,  such 
that  each  of  three  blocks  showed  72  stimuli  from 
condition  (35).  and  54‘siimuli  from  each  of 
condition  (36)  and  a  third  condition  which  wc  _ 
do  not  report  here.  For  .MSL.  two  Hocks  had  90 
trials  each  of  conditions  (35)  and  (36). 
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aoc.Ae  saaK  as  ii.fanins  OfcnBcais.  For 
MSt,  hyFfroairf.inasiqf  mx  3tJSi^'ir, 
wah  jamwiir  dtrjraaae  rarati^  ~  ef 
IXlptd'iK.fer  coaiSams  (I)  aad  (2).-cFor 
JBLIndpnaad  iaifa^ns3iJ(>^%r.aia. 
iBcitaacBCjdforaxiiii  iMatdQr  of  193  acd.'dec 
iacoodaoss(l)Kd(3j,aad9j6ficd31o:&c'ilii£  - 
ci^azced  cDodEaoB  C3). 
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Uea^iahc 

^osnaufo  of  the  es^  (Exftimml). 

VTcea  a  ssd^  is  de^scud  i^-  a  liadoni  saa- 
f£sg  of  safzx  poias  tbca  nsdergo 
routtoa.  local  nqpons  <d'  trato  or  lox'cr  doi 
dcssriy  change  oser  roaiioa.  To  assess  the 
possiUHiy  that  these  changes  in  dot  d»sity  per 
se  can  be  esed  as  csks  to  3D  shape,  idenal^- 
tioa  peribroiance  for  image  seqixnecs  that  in- 
dude  both  motion  and  density  cues  is  compared 
to  those  in  which  den»iy  cues  are  cliniinated.  or 
in  which  only  the  densiy  (but  not  the  motion 
cues)  are  presetted.  (See  Method  Expl  I 
for  experimental  details.)  Relesant  indnidual 
subject  data  are  ihown  in  Fig.  4.  (These  results 
were  initially  reported  in  Sperling  et  al..  ISS95 
Eliminating  density  cues  from  motion  sequences 
has  only  a  small  effect  on  the  subjects'  ability 
to  identify  shape  from  strong  structure-from- 
motibh  stimuli,  which  may  actually  be  due  to 
introduction  of  scintillation.  One  of  the  three 
subjects  (MSL)  was  able  to  perform  significantly 
abose  the  K9%  guessing  baserate  (29.6,%)  with 
density  cues  alone  in  the  absence  of  motion 
cues,  by  using  a  sophisticated  guessing  strategy. 
Since  our  conditions  involve  the  .disruption 
of  strong  input  to  low  level  motion  systems,  it 
was  desirable  to  eliminate  any  cue,  such  as 
density,  which  might  contaminate  estimates  of 
shape  identification  with  weak  structure  from 
motion  image  sequences- .Therefore,  all  other 
displays  exclude  the  density  cue.  All  critical 


F%,4.gapck5fsftfeaacepcrfscsasccfecfcfr!alfeft»>^ 
acd  »tApc;  ^szaay  caeu  *aS  foe  Use  cSfss4>  ceS* 
Feifcssuxe  ntfo  »  Uees  0  lo  1(0%.  «i:b  a 
1.9%.  Tbr  lifer  p»seb>!ao«  d»u  for 

im3£c  sequences  consirucied  lo  ha\e  uni* 
form  dot  denuty  in  local  repons  or  ihe  ima^e 
plane. 

Standard  scqurncei  motion  vithoui  density 
ewr,  standard  and* half ‘spfed  (Experiment  !). 
Percent  correct  3D  shape  ideniificanon  is  shoun 
in  Pip  5.  Standard'crrors  oT  all  proportions  in 
the  figure  are  less  than  6%;  chance  is  1.9%.  The 
3D  shape  task  is  jllusiraied  in  Fig.  3.  Standard 
sequence  conditions  display  sampled  dots  uhich 
are  a  fixed  increment  brighter  than  the  gray 
background.  Percent  idehtificaiion  levels  are 
shouD  for  ‘‘standard**  rotation  speed  (sinu' 
soidal  rotation  of  amplitude  25  deg  and  period 
30  frames,  at  frame  rate  of  15  neu  frarhes'sec). 
and  for  half  speed  (7.5  neu  frames/sec).  The 
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■  S.  S^tjc  iiRsScKica  rcrfcc=2m  (k  siaied  &>- 

fixi*.  HitTsa^s^  pjy  f;«=r  aS^cTSKef  poSai^y 

aad  a  s=bR  cf  ck:icI  dsiiliya.  Pctfcnuacc 
rasf?tf  ^KsOfc  fa»n;&Mna:cof^9%. 

Tie  ihier  pne'A  izu  Ue  iz^iiAst  ssl^s.  (Cc«< 
fnsl  caanibiCf  At;  CFS  ) 


aicraf:  percent  correct  is  similar  Tor  bot'n 
speerJs.  »r:h  half-speed  slightly  less  for  subjeas 
JBL  and  CFS, 

Gray  frame  dilution  (Expcrinieni  I).  By  inter¬ 
spersing  a  background  lesci  (gray)  blank  frame 
between. each  frame  depicting  points  of  the 
object,  ue  priesenled  direction-ambiguous  infor¬ 
mation  to  first-order  motion  mechanisms  while 
maintaining  the  sisiKlity  of  the  dot  features  in 
any  ghen  franie  fsee  DiKussion  section:  Fourier 
Analysis  of  the  Stimuli).  There  were  two  vari¬ 
ants  of  this  manipulation;  one  which  equated 
the  viewing  time  for  each  new  image  seen,  biit 
consequently  slowing  the  rotation  rate  of  the 
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iMh  of  tee  variMU  drsaroyrd  Ac  aMiy 
io  lecweo'  Aape  iteaate  liroga  ihc 

Mates  (n  Fi(r  ^  (Mf  Mc  «f  tee  saljccn 
(MSL)  aaiMjaael  lyriicaMly  atec  chaaee 
pesiennMCifimsate  at  11%)  oa  aiofe  se- 


ttis  K^RseMS  aM«c  CMKit  idcanKaiMMRr  per* 
feraaaoe.  k  it  dnantiedy  wane  Ikra  tit 
stearate  pcifeitiatc  at  aeaiiy  Mfi  akh 
she  aa|»nwhrJ  Maadad  trqacaec.  Kecate 
speed  a  , tee  laaget  had  ealjr  sad  eaeot 
ea  cah^  sttadaid  or  jataitiof  gray  coadi- 
tet.  aad  that  caa  aof  accoaa;  '1^  dhe 
aafiri  ofaheiaaiiag  jay  fcmeteaTP  shape 
perfonanee. 

Abtraati^  pataritr  (Exfetimul  IflaftHxz- 

ky  ahentatioo.  the  srioschis  soteas  (sabsaspied 
docs  OQ  the  shape  tsilacc)  ahenaate  betacea 
tateasiry  iB,^c]iicas  and  dcctcaKBSt  (fighi  oa 
gr^  than  dark  oo  gray)  tin  ears  frame.  Adja¬ 
cent  image  frames  pnoarBy  sappon  motioa; 
senabs  the  incorren  sags  is  the  fitsT-ordee 
Qsrea.  Asahsit  of  the  dtange  is  locatioa  of 
these  aiotk~:«-j^nalt  mer  marry-  ftanres.  or 
arrahsit  f^..»-.jg  sosie  form  of  rmiSote 
(second  order,  or  sos-Fouiicr  atmlysis.  see 
Chubb  &  Sperling.  IKSb)  could  siappori  the 
correct  moticn  interpretation.  Two  levels  of 
polanty  alternation  were  eiamincd.  one  with 
light  dots  equal  in  intensity  to  those  in  sundard 
image  sequences  and  one  with  light  dots  half  the 
intensty  of  those  in  standard  image  sequences. 
In  both  cases,  the  dark  dots  were  symmetrically 
below  the  background  level.  Again,  disrupting 
the  input  to  low  level  motion  systems  reduced 
shape  identification  performance  to  near  guess¬ 
ing  haserates.  Only  one  of  three  subjects  (MSL) 
retained  above-chance  identification  on  polarity 
alternation  stimuli  (average  of  10%). 

laicnsiiy  allernaiion  siinmli  (Experiment  I). 
Introducing  blank  (gray)  frames  between  every 
stimulus  frame  in  an  image  sequence  causes 
ambiguous  signals  in  the  first-order  motion 
systems.  Intr^uang  polarity  reversal  caused 
direction-reversed  signals  in  the  first-order 
motion  systems.  Both  manipulations  also  intro¬ 
duce  whole-screen  flicker,  stimulus  frames  in¬ 
cluding  intensified  dots  appear  every  other  new 
frame  for  a  flicker  frequency  of  7.5  Hz.  We 
included  two  eon'rast  alternation  (without  po¬ 
larity  alternation)  conditions,  which  also  exhibit 
whole-screen  flicker  at  7.5  Hz,  both  of  which 
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sesaia  pafecxaos  ietSs  ciosE  to  :s£l  of  zhs 
saeiiied  s:c=s!ss. 

(Xae  Sdtio  coESe)  alsssatsd  the  of 

saexSss  poots.  te»:s3  the  caeaEq:  le»d  ia 
oscsatl  (E^iiys  zad  nis  this.  Tes  s&adss  & 
the  sa=3  of  the  saadisd  sesadss  and  the 
frzse  scraefss.'ne  ochei  Sdier  coctrd  zSier- 
cased  henesa  1.5  zed  05  the  soadzrd  ieids. 
T&  scssSss  s  the  se=a  of  z  hzff  coctrM 
stzedind  znd  the  ftSeosazst  pxy  frzase 
s&=dsK'  Ahitsztnify.  teas  sseaiics  aa  be 
deerss^osed  inus  z  fizadzrd  stsge^gs  phs  z 
ha!f<ocaszsa  pobtfiy  zheezdoa  stsssiss  ^  a- 
Ir^Seier  zdduj  sosadss).  The  ^^oesizee: 
kseis  oa  both  ccenrol  cooStioas  zre  qt^ 
coassKst  wish  z  Focrier  poser  (first-emder) 
zaah-sis  of  these  seqoeaces^  the  DstessSoa^ 
Tbss.  zdSdoa  of  &&er  per  sr  does  i»3  zccosat 
for  the  dareateats  ta  perforataace  for  zherrtzi 
iae-cray  ard  aherciUee'poSirity  dispSays. 

fqusred  iaatnsr  msiro}  fExpeiinscrj  2).  We 
hate  deaosstrated  that  gray  fmnae  zlsessa^oa 
and  polarii}  alteraatioa  both  seterely  £srepi 
the  at^'tt  sci5tc!s  to  exiizct  3D  shape  &osj 
an  image  seqaerce  whsth  allows  iugh!}  aenrrate 
3D  shape  ideati&atioa  irtsde;  startdard  display 
coaditroas.  Howeter.  perhaps  this  disnqraoa  is 
not  unique  to  the  recoteo  of  depth  iaforma- 
lioa  Perhaps  it  simp!}  reSects  a  geaeral  dstup- 
tion  in  tisib'ilt  or  motion  discrimirmiion.  In 
order  to  control  for  this  posKbilitj.  we  con¬ 
structed  cquated-intensit}  controls  based  on 
performance  in  siaipli  direction-of-motioa  dis- 
cciminalion.  The  details  of  the  direction  discrim- 
ina..on  data  are  described  below  and  in  the 
Method  for  Etpi  5.  Bt  reducing  the  inlenai} 
(lowering  contrast  and  hence  tisibility)  of  a 
standard  (light  on  gray  background)  plarmr 
motion  stimulus,  it  is  possible  to  make  it  equii- 
alent  to  a  full-intensii)  pciarilt  alternation 
stimulus  for  the  purposes  of  left-right  direclion- 
discriminaiion.  The  direciion-discriminaiion 
displays  present  a  patch  of  moring  dots  of 
approximately  the  same  area  as  a  bump  in  the 
3D  shape  displays.  Haring  found  the  equivalent 
reduced-contrast  standard  stimulus,  we  then 
compaicd  3D  shape  discrimination  for  the  two 
stimuli  (rcduccd-comrasi  normal,  full-conirasl 
polarity  allemalion).  These  results  are  shown  on 
the  extreme  right  in  Fig.  5  for  MSL  and  JBL.  If 
the  cffecl  of  polarity  alternation  can  be.  at¬ 
tributed  solely  to  a  risibility-related  decrement, 
then  the  equivalent  intensity  condition  should 
have  yielded  equal  shape  identification  perfor¬ 
mance  to  that  for  polarity  alternation.  In  fact. 


Io»iri^iaiaKj.sKitiisdr2SiaJsS=rriiea- 
tSaosQ.  bea  lewis  wire 
sfeijc  idearisasac  frees 
uaa  Tbe  perceas  ifeaaSasSoa  fx 

saadKieqssaJeaicaeaKyzadprfjrjtyzSar- 
azriog  ooedErioes  were  tj%.  45*,  esS  15%, 
re^ie^riy,  Ibf  llSU  zad  dSn.  55%  zsd  <■*., 

si^pethriy.  foe  JBL.  fSaeOxri  error  of  the 
4a?*  zad  33?*  cquited  ccotrzst  coa£rioas  is 
±6%.) 
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3JL  We  hztr  sbora-dtai  epaSriosiS  wiari! 
cSsn^l  mperi  to  'loir-lcvd  exyjoa  zsxSyiin 
ziso  cEmraili  tbs  zheSty  to  paaiie  iksx- 
£a:2KC^  shape,  zi  tezsl  ia  the  coatSrioas  of 
oar  expenareats.  It  is  ia^estiag  to  costrzsi  this 
with  z  maaipulaiioa  wisrii  eSa^tes  the  ability 
to  track  iadiridual  iaage  fatcres  (do»|  osrr 
ssxbiple  fraases.  Models  that  empbasiae  the 
extizetioa  of  ^seriSc  iaiage  feaiares  and  tberr 
iamge  pSane  loaiiaa  (HoSiaaa  &  Besaett, 
I9S5;  Ulhaza.  1919.  I9S5.  cx.)  might  predrci 
that  riiaiiaating  feature  stahcEly  should  hase  aa 
equally  largi  impact  ca  the  shape  idcanbcairoa. 
We  cnestigated  this  hypothesis  by  coarpanag 
feature  siatilii)  oxer  a-fuH  30  fraaii  lajagc 
sequetsce  with  stimuli  in  which  features  (surface 
doa)  were  stable  for  only  3  and  2  fraares.  after 
which  they  were  replaced  with  a  different  ran¬ 
dom  sample  of  doa  (Fig.  ha).  The  shape  rdeniifi- 
cation  data  are  sbo-wa  in  Fig.  6b.  Fo:  two 
subyeca  (MSL,  CFS).  'educing  tracking  to  two 
frames  (and  increasing  sciniitblion  subsian- 
liallx)  had  very  little  effect  on  performance.  A 
third  subject's  (JBL)  two-frame  rifclimc  identifi¬ 
cation  performance  was  about  54?«  ot  normal. 
While  this  was  a  2  x  loss,  it  was  a  much  smaller 
loss  than  the  10  x  loss  induced  by  polanly 
alternation  for  JBL.  Thus,  feature-tracking 
models  of  Ihc  kinetic  depth  eifcci  appeal  unable 
to  account  for  the  performance  in  out  experi¬ 
ments. 

Motion  tisibilil}.  disctiminaiioii  and  segnien- 
lotion 

This  section  compares  the  disrupiixc  effects  of 
polarity  alternation  on  3D  slruciurc-from- 
molion  (shape  identification)  to  ns  effects  on 
xisibifity,  dircction-of-molion  discrimination 
and  segmentation. 

Motion  tisibilil}  (Experinieni  4).  Subjects 
were  asked  to  delect  which  of  two  tcmpor,il 
intervals  contained  a  motion  stimulus  and 
which  contained  a  uniform  held  of  background 
intensity.  The  motion  stimulus  was  cnhet  a 
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FKct  dxi  XGaa  xivixs  a  i3  fasei  e&  ^ 

la  tic  Ktritsd  3^!bs»  coa^slce.  ceeud  £es 
iasiy  c»  asaaS)  kseiacti  iU  KoSa^x.  fa  ts 
ajaisii£s*Jsac£2&fas^\a^^at^fkAitsfie£ 
iiaxes^exlifataofiiaef^iiiaciU-ttsfiiaity 
taaia  tsajic.  (h]  rcrtcsl  UssHi^'s  Ck  Ibc: 
saijcas  ia  oA  of  l!iS  £&:=>;  ccate.^  Ga»i^ 

t>iKTt»  i>  I  .$%.  SSapc  >i;a:iSaiwa  It  E:iV  a.%»3 
ifrase  Tfcr  M33  3aA=e  ia*>  br  a  coa«- 

qara»  of  sdauSiun  BCl  losa  of  irajKto:}  ia<b;»aoa. 

random  doi  field  moWn;  al  uniform  rcloeily  to 
the  rijhi  or  kfi.  or  a  poiariiy  akcmaiion  a-eraon 
of  the  same  siimulus.  The  dispby  is  schemaii- 
cally  illusiraied  in  Fip,  7a.  The  sire  of  ihe  region 
«as  approximately  that  of  a  sincle  peak  or 
s-alley  in  the  shape  displajs,  of  approximately 
the  same  dot  density  and  a  representative  veloc¬ 
ity  (betueen  that  of  the  ground  and  maximal 
velocity  of  a  peak  or  vallev).  (See  Method  Expt 
4  for  details.)  Detection  may  reflect  contribu¬ 
tions  by  nonmotion  systems.  For  example.  Wat¬ 
son  and  Ahumada  (19SS)  claim  that  detection  of 


£9!ays  (arcaded  across  cesSnsa)  vedded 
75%  asd  74%  .conca  deeecaoa  re^>e£avelv-. 
For  JBL.  i&e  Spues' were  S5%  xad  99li.  re^test- 

miir.  Wbctcas  pobshy  aftersatioo  abaos;  &- 
iticxi  the  a^y  lo  cxiiaci  ihfee-dhateiBoad 
SI  nay  s^aK-  istpeov'e  sameiss 
dcacdoo  rcbihe  to  standard  £spiays  for  <ks 
cosdhsoas.  Dcaclioa  accusacy  «n!i  pabtisv 
aberoanoa  s  cssesisally  perfea  ai  uaesshy 

kvds  axsparabte  to  lisose  nsed  ss  the  3D  s!a;4 
experimcDi  (5ISL  at  50%  istctisriy  is  95%  cor¬ 
rect.  assd  JBL  at  IIXKi  intestsety  is  96%  ccrrccl). 

Hie  senaB  eflect  of  polasity  aberaatios  on 
deactioo  perfonsianee  k  cosskteni  nitb  ibe 
ocar-qsimcuy  of  tnereeaent  and  decremeas 
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moving  stimuli  with  velocity  less  than  2degfsec 
is  performed  by  non-motion  systems. 

The  detection  data  arc  shown  in  Fig.  7b. 
Across  a  range  of  stimulus  intensity  increments 
(17%-50%  of  standard  letel  intensity  for  MSL, 
33%-100V«  of  standard  level  intensity  for  JBL), 
Ihe  effect  of  polarity  alternation  was  small. 
For  MSL.  standard  and  polarity  alternation 
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fif-  7,  (a)  llluvtralion  of  the  iKO-imcnat  forced  ihoxv 
(2IFC)  raradifnr  for  the  motion  visit^iu  larL  Swticetc 
jwffnf  ohich  1  sec  interval  cor.lair.ed  a  stimulus,  and  uhn.*' 
■  interval  vvas  blank,  (by  Percent  detection  of  a  pljnai  motior 
displai  in  the  2IFC  task.  Detcetton  is  measured  fc:  v3a'-.d--d 
and  polariry  alternation  tmare  sequenecs  a>  a  fu-.iior  o^ 
dot  iniensit)  (evprevsed  as  a  ptteentarc  of  a  wra:.'' 
intensity)  Guessing  Overate  tv  Sn', 
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i^esbcHsios^i-i^^pcicsaldacclioaex- 
psinasu  (Knasiosir.  1950:  Jtsshlnss.  lOKIj 
-Roofs.  l97^ah!ioi4hsoEiies:od3Sfiaddea£- 
jmu  s^dy  0^.10  dcuci  (Pa-.d  £  Jooes. 
196S;  Shon.  19$^  Ahmstindy.  ihe  fenda- 
mesa]  IBckei  compooesi  of  He  pobncy'atur- 
oatioD  s&nulos  is  7.5  Hz.  appnnisiatdy  at  ihc 
peak  of  the  ISidxr  scssilhity  fossaioa  (K’aisoa. 
1956).  «1^  sa^cw  that  polarity  ahenaiion 
may  bc^most-  sensiKi^-  detected  Sickcr- 
senshhe  mcdiamsss. 

DirtaUm-t^-mKioB  lEsennsisIion  (Expert- 
■i>se>:r'5]L  Selects  were  asked  to  diseriasinate  the 
diecction  of  rootioo  (richl  or  left)  of  a  smail 
patch  of  randosi  dots  moricf  «ith  inofonn 
s'docity  (Fif.  Ea).  The  dots  «erc  eslhcr  ahrays 
b'pht  acainst  the  tackyrotsid.  or  alternated  po- 
briiy  frost  frame  to  frame.  Dtsettminatios  uas 
ccamined  over  a  ranee  of  imensity  iitcrestents 
(or  decrements)  per  dot.  (See  Meih^  ExptSfor 
details.) 

EKreclios  discrimination  data  are  shottrt  for 
too  subjects  in  Fie.  Sb.  Polarity  alternation 
impaired  subjects'  aWiity  to  diseriminate  motion 
direction:  ateraced  otier  inlenriiy  level,  stan¬ 
dard  and  polarity  alternation  conditions  yielded 
85%  and  69tl  corren.  respecthely.  for  subject 
MSL.  and  90%  and  67tl  respeciitely  for  JBL. 
However,  at  the  intensity  levels  that  were  inves¬ 
tigated  in  the  shape  identiiieaiion  espetimeniv. 
levels  of  direction  discrimination  for  polamy 
aliemaiion  stimuli  were  pood:  87%  correct  for 
MSL  and  8S5e  correa  for  JBL.  Intensity-based 
decrements  for  standard  displays  in  this  experi¬ 
ment  were  used  to  select  the  "equated  intensity" 
condition  listed  above  for  shape  identification 

The  patch  size  in  the  direction-of-motion 
displays  were  selected  to  be  approximately  the 
size  of  a  bump  or  depression  in  the  shape 
displays.  The- speed  of  drift  (0.35dep'scc)  was 
selected  to  be  representative  of  the  modest 
tpeeds  in  many  points  of  the  3D  shape  displays, 
where  peak  speeds  may  range  up  to  2.5  deg  sec. 
Based  on  data  from  direction  of  motion  discrim¬ 
ination  in  ncar-ihrcshcld  sine  wave  stimuli  (Ball 
&  Sekulcr.  1979;  Burr  A.  Ross.  1982;  Green. 
1983;  -Watson.  Thompson.  Murphy,  &  Nach- 
mias,  1980)  and  theoretical  computations  on 
direction  of,  motion  discrimination  for  random 
dot  stimuli  (N'akayama,  1985;  van  Doom  & 
Koenderink.  1982).  we  .picked  the  weakest 
.motion.stimulus  that  could  be  derived  from  the 
3D  shape  task:  the  slowest  reasonable  speed  and 
approximately  the  same  number  of  dots  in  the 
displays  to  be  comparable.  That  the  direction  of 
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Ftf-  S  f<)  Schc:»iic  i3»:raii03  cf  the  tse^icn  direction 
di«cnneiR3tio»  UtL.  O'jtct  dcu  were  dxcasK  no»e.  dots  in 
ibe  central  ^ich  dnfted  Hi  or  nfhi  t\  0.55  dcf  sec  Sub- 
jeas  pdfcd  the  direction  of  motion  of  dots  in  the  central 
patch,  (b)  Percent  correct  dnrnn:irii3:»o&  of  the  direction  of 
motion  m  the  TO  mottonHluection  dispb}.  DHcriminaiion 
IS  sho«a  as  a  function  of  the  intensitv  increment  (as  a 
percent  of  the  “standard"  inieivKtj  inerementl.  ©f  the 
siimului  dots  on  a  prat  baclf  round  The  intensiiv  incre¬ 
ment  »berr  the  dasised  hne-ati^arrem  intersects  the  perfot* 
mance  Hne  for  standard  dispJass  equates  su.ndard  (at 
reduced  sn:ci»iiics|  ar.d  polaritv  alternation  displas's  (at 
standard  intensities)  The  (uessin;  hascraie  is  50**.  Panels 
shoH  the  data  of  different  subjects 

moiion  of  this  siimulus  is  ncarlv  alua>‘$  judged 
corrccih  at  standard  intensities  implies  that 
direction  of  moiion  at  a  single  location  is  almost 
complcicl}  intact  uhen  3D  shape  identification 
is  at  zero, 

in  two^frame  experiments  or  multi-frame 
experiments  inhere  tuo  frames  appear  alier- 
natc!>,  polarit)  alternation  ma>  lead  to  belot\ 
chance  performance  on  direction  discrimination 
(Anstis.  1970).  Polaritx  alicrnation  excites 
first-order  (Fourier)  spatio-temporal  sensors  for 
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Ejoiion  oppasiie  to  the  voidiol  &ca5oa,  as 
'sdic^uc^Iy.inissuatcd  in  Fig.  2c.  Hoc,  in 
-inchi-franic  movcDcnt.  tb:  (lenpw^'  and 
-^tiall})  local  sappatt  for  nM»Tincni'in  tbe 
i^ipbsic  Erection  is  app2ren%  more  (ban 
c^sci  1^'  sccdnd-OTder  (nonFooricr)  procescs 
snSciendr  often  ibal  dircctioo  disoinunation 
larelytfalls-bdoa'  50%.  Qmbb  and  Sterling 
(19SSa,  19S9a.b)  shov  that  tbc  rdath^  dos^ 
nance  of  the  firs!.ordcr  and  second.order  infor¬ 
mation  in  polarity  ahemation  sthriuH  depei^_ 
on  tbe  spatial  scaSe  (near  saesring  distances  favor 
second-order  information). 

Motion  segmentation  (Experiment  6).  In  con¬ 
trast  «ilb  rimplc  detection  or  discrimination  of 
motion  direction,  a  more  complex  direction  task 
did  shou'  decrements  in  performance  more  com- 
pcnble  to  those  seen  in  shape  identification.  We 
developed  a  motion  segregation  paradigm  in 
nhich  nine  small  patches  of  uniformly  moving 
dots  uerc  presented  as  a  3  x  3  grid  embedded  in 
a' border  of  moving  random  dots  (Fig.  9a).  All 
but  one  patch  depicted  motion  in  the  same 
direction  (left  or  right),  uhile.the  odd  patch 
depicted  motion  in  the  opposite  direction.  The 
stimulus  dots  either  remained  above  the  back¬ 
ground  level  (light  on  giav),  or  alternated  polar¬ 
ity.  (See  Method  Expt  6  for  details.)  In  this 
situation,  polarity  alternation  had  a  large  im¬ 
pact  on  selection  of  the  odd  patch.  MSL  re¬ 
ported  95%  correct  locations  with  the  standard 
display,  but  only  22.2%  with  polarity  alterna¬ 
tion.  JBL  reported  S4%  correct  and  10.5% 
respectively  (chance  =  11.1%)  (Fig.  9b).  The 
accuracy  levels  for  polarity  alternation  displays 
are  consistent  ■.•iri  sophisticated  guessing  (see 
Discussion). 

DISCISSIO.N 

Fourier  and  mmFourier  inputs  to  structure  front 
motion 

Vivid  3D  shape  perccpiswhich  allow  accurate 
3D  shape  identification  can  arise  from  appiopri- 
aicly  constructed  2D  image  sequences  depicting 
projections  of  those  shapes  under  rotational 
motion.  Typically  these  2D  sequences  provide 
good  input  to  first-order  spatio-temporal 
("Fourier”)  motion  analyrers.  In  order  to 
determine  whether  strong  Fourier  motion  is  a 
prerequisite  to  shape  extraction,  we  examined 
display  manipulations  which  maintain  the  iden¬ 
tity-correspondence  between  points  in  succes¬ 
sive  frames,  but  disrupt  first.order  analysis. 
Interleaving  blank  frames  or  alternating  token 
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Fig.  9.  (a)  Schmulie  itiusiralion  of  Ihr  nicc-koiion  forced, 
dicw  19LFO  rnolion  scgiocr.talion  disptjy  Sohjecu 
judged  ihc  location  of  the  nr.glc  patch  moving  opposilc  in 
diroclion  lo  ihc  olhcr  eight,  (b)  Perccnl  correct  location 
judgctnml  for  lhc9LFC  uvk  for  uandard  and  allcmaling 
potarily  displays  the  Iwo  subjxtf.  G-otSiirg  baseline  is 
11.1%  (I  in9j. 

contrast-polarity  both  had  devastating  conse¬ 
quences  for  the  ability  to  identify  3D  shape  in 
our  displays.  The  inability  to  recover  shape 
was  not  due  to  overall  display  flicker  since 
same-sign  alteration  in  the  intensity  levels  of 
particular  tokens  did  not  seriously  disrupt 
performance.  Subjectively,  a  sensation  of  local 
motion  was  maintained,  and  selected  points 
could  still  be  tracked.  Nonetheless,  this 
information  was  not  adequate  to  support  shape 
idenrification. 

The  dependence  of  -3D  shape  perception  on 
unambiguous  .first-order  (Fourier)  motion  in¬ 
puts  suggests  that,  for  our  stimuli,  direction  and 
velocity  serve  as  the  primary  input  to  a  subse¬ 
quent  shape-extraction  (structure  from  motion 
compulation,  e.g.,  Koenderink  &  van  Doorn,_ 
1986).  Obviously  the  velocity  information 
must  be  computed  simultaneously  or  nearly 
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.^ulianeotisl)'  at  several  locao'ons  in  order  fo 
perforai  the  3D  shape  task. 

The  main  alterrsamn  to  local^idodty-based 
computations  depend  on  jeometric  analyses  <rf' 
identified  feature  elements  and  operate  over 
more  than  vtro  frames  (e.^  Ulfaisan,  19SQ. 
These  altoiutive  schemes  are  challen^  by-  our 
-  finding  that  shape  extraction  is  little  aflededl^- 
diange  in  feature  elements  as  often  as  every  two 
frames.  Further,  subseqiiat  «-ork  (Landy, 
Dosto,  Speding  &  Pedtins,  19SS)  sho«s  that 
moUon  displays  of  only  two-frames  also  »ppon 
moderately  good  shape  idratification. 

Wiliams  and  Phillips  (1986.  1987)  report 
what  thQ-  cosiuder  a  ^rprising  perceptual  phe¬ 
nomenon  ofp«ceismg  a  3D  shape  in  a  random-  ' 
dot  flow-  field.  We  interpret  their  finding  here  as 
further  esidence  tha*  a  local  velocity  computa¬ 
tion  is  the  basis  of  perception  of  3D  shape.  In 
their  dynamic  2D  displays,  dots  execute  a  ran¬ 
dom  walk  of  constant  step  size,  w-ith  displace¬ 
ment  angle  chosen  from  a  uniform  distribution 
-with  a  range  less  than  150  deg.  Subjects  perceive 
a  rotating  and  translating  3D  cylinder.  In  these 
stochastic  displays,  velocity  information  is  very- 
similar  to  the  local  velocity  information  in  a 
cy  linder  with  dots  sprinkled  through  its  volume, 
rotating  rigidly  and  translating  along  its  axis  of 
rotation  (e.g.  as  displayed  by  Dosher,  Landy-  & 
Sperling.  19S9).*  As  in  our  experiments,  the 
mdmeriiary  distribution  of  velocities,  not  the 
stochastic  trajectories  of  individual  dots,  deter- 
rriines  the  3D  percept. 

3D  shape  extraction  is  espedally-  impaired  in 
displays  that  have  contradictory  or  ambiguous 
first-order  (Fourier)  information.  Control  ex¬ 
periments  demonstrated  that  contrast-polarity 
alternation,  which  essentially  eliminated  3D 
shape  identification,  nonetheless  left  the  detec¬ 
tion  judgement  and  the  direciion-of-motion 
judgement  for  a  small  isolated  moving  patch 
quite  high.  Motion  segmentation,  which  re¬ 
quires  analysis  of  motion  direction  in  a  number 


*In  pie  display  of  a  irar.sparenl  cylinder  filed  with  dors, 
relating  around  a  central  vertical  avis  and  translating 
upward,  dolt  viewed  Ihroiigh  the  middle  of  the  cylinder 
have  a  greater  range  of  lateral  motion  velocities  and  dots 
at  the  2D  edges  have  a  smaller  range  of  velorilies;  in 
Williams  and  Phillips*  random  flow  field,  there  is  a  wide 
range  of  velocities  throughout  the  display.  However,  at 
the  edges,  dots  disappear  and  re-appear,  this  scintillation 
(as  in  Evpt  2)  reduces  the  magrilude  of  perceived  depth; 
mean  lateral  velocity  in  both  areas  is  aero.  The  effective 
flow  fields  for  these  d.fferently  constructed  stimuli 
actually  ate  quite  similar. 


of  hx^  di^y-  TC^S.  wag  profo-dsiiv 
iffecicd  1^-  polarity-  alienation. 

A  Fourier  eotr.iasal^.  for  ike  suenah  t  ^  rjn- 

onkr  motion  ftreeptum 

Up  to  this  point,  wr  havr  talked  in  geaemlr- 
t«  about  Fourier  , and  non-Fourier  computa- 
rions  of  motion  direction.  Here  we  prpp.o>e 
some  very  simple,  spedfe.  Fourier  co,-npu:3- 
tions  that  acMunt  quite  well  for  the  results  that 
wte  have  attributed  to  first-order  motion  pro¬ 
cesses.  The  compu'tauon  proceeds  as  follows. 

(1)  Cbmputc  the  Fourier  transform  of  the  siim- 
uliB  as  il-wiM  viewed  by-  the  observer.  i.e.. 
with  the  corrM  visual  angle  and  an  accurate 

description  of  the  display-  that  was  actually 
produced:  Compute  the  power  pla^ja,)  of 
each  spatio-temporal  frequency  component. 

(2)  Retain  only  the  power  p,  that  exceeds  a 
small  threshold  c> 0.  i.c. p.fa.ja,)  =  max 
(p(o,aa,).-£.0]. 

(3)  Retain  only  the  Fourier  components  that 
fall  within  a  window  of  visibility  (Watson. 
Ahumada  &  Farrell.  1986)  that  includes  all 
spatial- frequencies  greater  than  zero  and 
less  than  or  equal  to  30  cycles  per  degree  of 
visual  angle  and  all  temporal  frequencies 
greater  than  zero  and  less  than  or  equal  to 
30  Hz.  viz.  (0  <  Itu.MoJ  d  30). 

(4)  'The  net  dirceiionai  power,  DP,  of  all  fre- 
qucncicswiihin  the  window  of  visibility  is  the 
rightward  power  minus  the  leftward  power; 

DP  =■  £  P,(tu,.cu,)—  S  p,(a,,a,). 

V,  to. >0 

The  computation  gives  equal  weight  to  all 
motion  components  within  the  window  of 
visibility  and  zero  weight  to  all  components 
outside  the  window.  In  a  more  refined  anal¬ 
ysis,  it  might  be  useful  to  weight  spatial 
frequencies  according  to  a  contrast  sensiliv- 
iiy  function.  However,  it  is  not  obvious  ho>v 
to  weight  signals  that  are  above  threshold. 
For  practical  purposes,  it  turns  out  that  the 
exact  size  of  the  window  of  visibility  has 
little  influence  on  relative  DPi  for  the 
stimuli  considered  here. 

Basically,  the  Icft-minus-righi-difrcrencc. 
summed  over  all  frequencies,  is  similar  to  the 
computation  that  is  earned  out  by  previously 
proposed  first-order  motion  models  for  exam¬ 
ple,  within  Its  window,  an  elaborated  Rcichjrdi 
motion  detector  (van  Sanlen  $,  Sperling.  19S-!) 
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Fif  10  (»-f) 

Fip  10  Snmulus  fcprcseniaibw  and  ccmipondjrj  Fourier  cncrp>  spccira  :\p)«I  of  \arious  diipl*> 
mounp  »\  a  rate  of  0J5<icg  kc.  The  ah^w  n  (bonroMal)  ipaual  location,  and  the  ordinate  i$  time 
revolution  of  60  Hr.  The  siimolov  n  other  !i|M  or  darl  tncrcmentv  or  decrements  on  a  gra)  background 
inner  K>xe>  rerreseni  the  windou  of  xwbjhtv.  assumed  to  resolve  levs  than  or  equal  to  30  c'deg  and  less 
consistent  uiih  the  intended  direction  of  motion. The  upper  right  (or  lower  left)  quadrant  of  the  spectra 
spectrum  for  the  -standard"  snmulos  are  shown  m(a,b).  for  the  half<ontrast  standard  stimulus  m  (c  d) 

contrast  2:1  in  (i.jK  and  for  the 


Fif.  10  (g*l) 


-.jic  i\  tiir.c 
^r.d 

.^rt:  spntr« 
in  (c.d>, 
ind  f»‘f  I"? 


Hf  10  (Continued) 

condiJionj  (a.  c.  c.  g.  i.  Each  siimu!u»  rcprcscotaiion  depjcxs  1 07  j«  of  planar  moiion  of  a  single  dot 
The  represcniaiion  assumes  spatial  resolution  of  60  c>cle$  per  degree  of  visual  angle;  and  temporal 
(K  d.  (.  h  j.  I)  The  corresponding  Fourier  spectra  are  showp  on  (\  (ah^issa).  w,  (ordinate)  axes.  The 
than  or  equal  to  ?a  Hz  The  upper  left  (or  lo«er  nghi)  quadrant  of  the  spectra  rcprt^.ai  povver  at 
represent  po^er  at  (u,.*--.;  consistent  uith  the  unintended  direction  of  motion  Thu  representation  and 
for  ihc  aliernaiing-grav  stitr.u’us  in  (e.  f h  for  the  alfernaiing<polarit>  sttmulus  in  (g.  hi.  for  the  alternating 
alternating  contrast  I  5  05  in  (k,l) 
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compu^  the  algebraic  sum  of  all  wlocijy 
inpuufthai-difTCT  in  Temporal  frequeiiq'.  Vel- 
ociiy  inputs  that  have  the  same  temporal  fre¬ 
quency  (and  therefore  difier  only  in  spatial' 
frequaic)-)  are  processed  by  detectors  of  diflef- 
ent  scales,  sensilis'C  to  diflei^l  spatial  fte- 
qucndes.  Outputs  of'di^crmt  det^ors  are: 
combined  at  the  next  higher  les'Cl  (c.%.  Adelson, 
&  Bcrgen,''1986). 

A  real  detector,. localized  in  space'and  time,, 
cannot  have  the  p^e^  r^olution  of  a  Fourier 
analyris  of  the  enjire  x,y,i  stimulus.  The  entire. 
Fourier  analysis  is  most  appropriate  for  analyz¬ 
ing  local  areas  «hcre  movment  can  be  regarded! 
as  uniform  and  homogeneous.  Even  Kith  all 
these  qualifications,  the  straightforu-ard  Fourier 
analyse  of  the  dot  mov'ement-patterns  is  quite! 
informative. 

Fourier  analysis  of  the  stimuli 

The  space-time  (v,/)  representations  of  a- 
single  dot  clement  in  each  of  the  motion  stimuli 
for  our  main  conditions  is  shovyn  in  the  left 
hand  panels -of  Fig.  TO.  The-Fourier  power 
spectra  for  those  stimuli  are  shown  in  the  right 
hand  panels  of  Fig.  10.  Figure  10a  represents  a 
dot  moving  from  left  to  right  over  frames.  The 
dot  is  the  standard  intensity  on  the  neutral 
background.  The  abscissa  represents  1.07  deg  of 
spatial  position  .v  from  left  to  right;  the  ordinate 
represents  a  1.07  sec  interval,  of  time,  r,  from 
bottom  to  top.  The  representation  assumes  a 
sampling  density  of  120  samples  per  degree  of 
visual  angle  and  120  samples  per  second  to  yield 
temporal  discrimination  up  to  60  Hz  and  spatial 
discrimination  up  to  60  c/deg  of  visual  angle.  (In 
this  representation,  the  four  refreshes  of  each 
new  image  frame  arc  seen  as  four  repeats  at  the 
same  location  -in  alternate  1/120  scc  samples. 
The  illuminated  dots  on  our  display  arc  depicted 
as  2  adjacent  spatial  samples.)  The  steep  space¬ 
time  function  reflects  the  fact  that  our  stimuli 
movcrelatively  slowly  (0,35  deg'sec).  Figure  10b 
shows  the  corresponding, Fourier  power  spec¬ 
trum.  The  abscissa  is  a,  and  the  ordinate  is  to,; 
the  axes  cross  at  a,aai,  =  0. 

If  the  standard  motion  stimulus  were  moving 
continuously  in  space  and  time.  essentially  all  of 
its  components  would  be  at  the  intended  direc¬ 
tion  and  speed. 'Because  it  is  sampled  in  time 
(60Hz. refresh  and  15  new  frames, 'sec)  and  in 
space  (by  the  resolution  of  the  pixel  array)  it 
contains  ambiguous  temporal  and  spatial 
components.  Most  of  the  power  is  in  the 
intended  direction  and  velocity  (upper  left 


and,  sy-mmcttically,  lower  right  quadrants).  But 
ti^c  is  a  surprising  amount  of  ptmerrin  the 
unint^cd  direction  as  will  (uppa  right,  and 
syanmetrically,  lower  left  quadrants).  The 
(0  <  £  30)  window  of  visibility  is  shown 

« the  inner  square  in  ITg.  lO.'The  computed  TIP 
strongly'favors.the  intended  direction  by  5:1. 
-Figure  10c  and  d  show  the  stimulus  represen¬ 
tation  and  Fourier  energy  spMrum  of  a  stan¬ 
dard  stiniulus.  at  half-intensity;,  (approximately 
.  that  of  the  contrast-equated  control).  The  trans¬ 
form  is' the  rame  as  Fig,  10b,  but  of  half  power. 
With  c  =9,  .  the'  computed  I>P  is  exactly  half; 
with  c>  0,  the.  computed  'DP  is  less  than  half. 

Figures  lOe  and  fshow  the  stimulus  represen¬ 
tation  and  spectrum  for  the  alternating  gray- 
frame  stimulus.  In  the'  case  of  gray-frame  stim¬ 
uli,  power  at  the  intended  direction  and  velocity 
is  halved,  and.approximately  balanccd  by  power 
dispersed  over  a  range  of  velocities  in  the  oppo¬ 
site  direction. 

Figures  lOg  and  hshovvthc  stimulus  represen¬ 
tation  and  spectrum  for  the  alternating-contrast 
polarity  stimulus.  In  this  case,  the.net  direc- 
tional  power  DPis  of  v?ry  slightly  lower  magni¬ 
tude  than  for  the  standard  stimulus,  but  favors 
the  unintended  over  the  intended  direction 
(more  power  in  the  upper  right  and. lower  left 
quadrants). 

Figures  lOi  and  j  show  the  stimulus  with 
.contrast  alternation  betvyeen  2.x  and  1  x  the 
standard  intensity.  This  stimulus  can  be  viewed 
as  the  sum. of  the  standard  stimulus  and  the 
alternating-gray  stimulus.  Although  the  2:1 
contrast-alternating  stimulus  has  some  of  the 
diffuse  power  of  the  alternating-gray  stimulus, 
2:1  contrast,  alternation  puts  more  power  into 
the  intended  direction  pnd  velocity  than  even 
the  standard  stimulus.  Figures  lOk  and  101  are 
for  stimuli  with  contrast  alternation  between 
1.5  X  and  0.5  x  the  standard  intensity.  This 
l.5:0.5  contrast-alternating  stimulus  can  be 
viewed  as  the  sum  of  the  half-intensity.siandard 
stimulus  and  the  aliernating-gray  stimulus.  The 
computed  DP  is  slightly  lower  than  for  the 
standard  stimulus. 

Tasks 

The  kinds  of  information  needed  for  good 
performance  in  the  various  tasks  is  summanred 
in  Fig.  II  and.  along  with  the  relation  to 
computed  DP,  is  explained  below. 

Detection.  In  Expt  4.  we  noted  that  simple 
Iwo-interval  forced  ehoice  detection  (21 FC  De¬ 
tection)  of  a  single  local  patch  of  moving  dots 
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Ftf,  1 1.  A  sehenulif  iliutlraiion  of  the  lindv  of  tnfomtalidn  required  in  order  to  perfonn  eaeh  of  the 
expenmenlal  lasts.  The  sinipte  TIFC  deleesion  last  may  relteei  the  output  of  nommolion  ssstenss  in  a 
stnste  location.  The  2AFC  disenmination  of  motion  direction  last  requites  the  output  of  a  motion 
dtrcelion  mcehanism  in  a  sinple  location.  The  9LFC  motion  sepmenution  last  requires  the  output  of 
motion  direction  mechanisms  in  a  number  of  locations  ncarb  simu1taneousl>.  The  ID  shape  last  requires 
direeiion  and  speed  inforriiatlon  from  a  number  of  locations  nearlv  simuliancousb. 


IS  probably  accomplished  by  other  systems  than 
the  ttioiion  systems.  The  equality  (or  near  equal¬ 
ity)  of  detection  with  standard  and  polarity 
alternation  displays  insures  that  polarity  alter¬ 
nation  did  not  result  in  peripheral  cancellation 
of  the  input  stimulus. 

Direction.  Discrimination  between  left  and 
right  motion  direction  (tvvo-alternative  forced 
choice,  2AFC  Direction)  minimally  requires  di¬ 
rection  (but  not  necessarily  velocity)  analysis  by 
a  motion  detection  system  in  a  single  location 
(Fig.  1 1).  As  shown  by  the  Fourier  spcctram  of 
Fig.  lOh.  a  first-order  analysis  of  a  polarity- 
alternation  stimulus  would  support  the  unin¬ 
tended  (opposite)  direction  of  movement.  A 
second-order  analysis  based  on  full-wave  rcctifi- 
catidn  would  yield  the  correct  direction  and 
velocity.  In  full-wave  rectification,  the  sign  of 
contrast  is  lost,  and  the  standard  stimulus  would 
be  recovered,  2AFC-direction  performance  is 
impaired'by  polarity  alternation,  but  still  well 
above  chance  for  a  wide  range  of  contrasts. 
Polarity  alternation  leads  to  high  levels  (about 
88%  correct)  of  2AFC-dircciion  performance  at 
“standard'’  contrasts,  hence,  perceptual  second- 
order  analysis  occurs  under  these  conditions. 
But.  altcrnaling-contrasi  polarity  stimuli  re¬ 
quire  higher  contrasts  to  yield  equal  direction- 
discrimination  than  do  standard  stimuli  which 


stimulate  first  plus  second-order  systems.  This 
might  reflect  power  loss  in  the  second-order 
analysis,  the  need  to  overcome  conflicting  first- 
order  information,  or  both, 

Momn  scgmcniaiion  In  order  to  isolate 
vvhieh  of  9  patches  is  moving  in  a  direction 
opposite  to  the  others  requires  that  direction  of 
motion  be  assessed  in  several  locations  (Fig.  II) 
We  examine  the  consequences  of  observing  (cor¬ 
rectly  perceiving  the  direction  of  motion  in)  n  of 
the  9  locations.  Observ  ing  just  one  patch,  which 
IS  sufficient  for  the  2AFC-Direction  task  would 
lead  to  chance  performance  of  onc-in-nine  loca¬ 
tions — identical  to  the  guessing  level  without 
seeing  the  display.  Observing  any  two  patches 
could  improve  performance  by  sophisticated 
guessing.  That  is,  if  the  two  patches  move 
oppositely,  then  one  of  them  is  the  target,  if  they 
move  in  the  sime  direction,  one  of  the  remain¬ 
ing  7  IS  the  target  The  probability  of  sampling 
two  opposite  direction  locations  times  a  guess¬ 
ing  accuracy  of  1/2  plus  the  probability  of 
sampling  two  same  directions  times  a  guessing 
accuracy  of  1,'7  yields  an  estimate  of  22  2% 
correct.  Observing  any  three  or  more  patches 
could  improve  performance  by  a  combination 
of  informed  judgements  and  sophisticated 
guessing,  etc  The  data  for  polarity  alternation 
do  not  require  us  to  consider  more  than  two 
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Fif  ISrThe  rebton  betntcn  3D  shape  ideotiheation  perfonnance  and  computed  ne/  directional poner 
DP  uithin  the  uindo«  of  MSibihty  and  abote  a  threshold  e.  Solid  arcin  on  the  abscissa  are  values  of 
Dr  computed  from  the  spectra  tn  Ftp.  10.  panels  (b).‘(d).  etc.  for  an  t  of  0.12  x  the  maximum  pouer 
xf'ue  in  the  spectrum  of  the  standard  stimulus  Open  circles  on  the  abscissa  are  the  x  alues  of  DP  computed 
for  an  t  of  O.  fThe  rank  order  of  conditions  urider  the  tuo  computations  is  the  same.)  The  3D  shape 
identiheation  performance  is  monotone  «ilh  DP  for  all  reasonable  values  of  c  ^  0 


observations.  Performance  for  polarity  alternat¬ 
ing  stimult  in  the  odd-in-nine  motion  segmenta¬ 
tion  task  was  indistinguishable  from  the  simple 
I  in  9  baseline  (1 1%)  for  one  subject  (10%),  and 
slightly, above  the  l  .in  9  baseline  for  another 
(22%),  which  could  be  achieved  by  sampling 
only  two  locations. 

Motion  segregation,  like  shape  extraction, 
may  be  dependent  on  strong  Fourier  input, 
largely  because  it  requires  evaluation  of  motion 


•At  certain  moments  during  the  roialion.  dors  on  bumps 
move  opposite  to  ground  dots,  and  at  other  moments 
dors  on  depressions  move  opposite  lo  ground  dots  To 
solve  ihe  rasV  by  motion  direction  only  would  require 
sampling  al  leavt  three  frames., .Thar  is.  to  observe  any 
motion  al  all,  requires  two  frames.  Since  ihcre  are  eintv 
Iwo  categories  of  motion-direclion  response,  frevm  she 
morion  observed  in  the  first  two  frames,  only  iwo 
categories  of  dors  could  be  observed  (e  g  left  or  righi- 
ward  moving).  By  observing  a  third  frame,  some  of  the 
dots  Ihal  were  categorired  logelher  in  ibe-first  two 
frames  could  be  dilTerenliaied  (c  g.  initially  leftward, 
ihen  rightward)  and  this  could  be  used,  rn  pnneiple.  lo 
set  up  the  three  categories  of  dots  (forward,  ccniei, 
behind)  needed  to  solve  the  3D  shape  discrimination 
laiV.  liowever,  we  show  (handy  et  al ,  I9S8)  that  iwo 
frames  suffice  for  accurate  performance.  This  means  shat 
at  least  three  (moving  leftward,  moving  n'ghlward.  not 
moving)  and  probably  more  categories  of  velocity  infoi- 
maiion  are  available.  Therefore,  for  ihe  present  discus 
Sion,  we  can  assume  thal  our  3D  shape  idenrificaiion 
task  has  access  to  ihiee-cattgoiy  velocity  information, 
this  veloeily  information  obtained  »mu1laneou$ly  from 
(at  leasir  siv  locations  would  sufhee  to  solve  the  task. 


signals  at  more  than  one  location  neatly  simul¬ 
taneously.  The  second-order  motion  system  op¬ 
erates  primarily  foveally  (Chubb  &  Sperling, 
1988b).  Two  locations  might  be  successively 
fixated  in  our  1  sec  displays,  For  standard 
displays,  performance  in  this  task  is  excellent 
(85-95%).  By  similar  computations,  this  would 
require  observation  of  approximately  7  loca¬ 
tions.  Thus. 'first-order  information  supports 
direction  of.  motion  analysis  al  a  number  of 
directions  simultaneously,  while  second-order 
information  can  support  direction  of  motion 
analysis  al  onlyvone  or  two. 

3D  shape.  The  simplest  solution  to  the  3D 
shape  identification  task  requires  simultaneous, 
or  nearly  simultaneous,  knowledge  of  the  mo¬ 
tion-direction  information  (and  possibly  also 
the  velocity)  at  the  six  bump  locations  (Sperling 
et  al.,  1989).  The  principle  is  that,  to  a  first  and 
adequate  approximation,  dots  oh  bumps  move 
in  one  direction,  dots  in  depressions  move  in  the 
opposite  direction,  and  dots  on  the  ground 
plane  move  very  little.  Thus,  to  solve  the  3D- 
shape  task,  motion  has  to  be  categorized  into  3 
categories  (leftward,  rightward,  and  near  zero) 
at  a  number  of  locations  simultaneously  *  Al¬ 
though  the  3D-shape  identification  task  could, 
in  principle,  be  carried  out  with  only  this  very 
coarse  velocity  information,  more  information 
usually  is  used.  For  example,  in  a  version  of  tl^ 
3D-shape  identification  task  with  different 
bump  heights,  subjects  can  quickly  discriminate 
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three' levels  of  bump'hnght  (Sperling  et.ah, 
1989).. The  buifi^hcighf  discrimination  is  based- 
on  speed.* 

Although  a  sophisticated  local  velocity.com- 
piilatiqn  prbbably  uhderlies  the  3D  shape  per? 
cepr,  for  our'Kt  of  stitnuli.'the  simple,(Fouriet), 
net. directional  power,  jOF.-computation  offers- 
an  adequate  account  of  performance  in  the-3D‘ 
shape  identification  rajl..  We  assume;thateheti 
directional  power  DP  serves  as  a  nieasure  of  the 
quality  of 'first-order  dirMtion  .  mformation^n 
the  various  displays.’lf  the  30  shape  identifica¬ 
tion  performance  with  our- displays  prirharily 
depended  on  good  first-order  infomation,  then 
the  performance  level  for  the  various  display 
would. increase  monotoiiically-with  th'e  quality 
of  first-order  information— here  indexed  by  DP. 
Figure  12  shows  theipercent  correct  identifica¬ 
tion  in  the  3D'  shape  task  as  a-function  of 
corhpuicd  DF  for  the  representative  2D  motion 
display  (Fig.  lOa-lj.  DP  is,  in  units  of  power 
normalized  to  the  standard  stimulus.  Identifica¬ 
tion  levels  increase. monotonically  with  DP.  as 
expected. 

Full-wave  rectification  of  polarity  alternation 
displays  (second-order  processing)  would  allovV 
recovery  of  intended  motion  signals.  However, 
3D  shape  identification  performance  on  these 
displays. is  approximately  at  chance  levels  (left 
half  of  Fig  12).  In. principle,  systematic  DP 
favoring  the  unintended  direction  might  be  used 
in  sophisticated  guessing.  bu(  apparently  is  not. 
Performance  on  displays  with  polarity  alterna¬ 
tion  may  also  reflect  conflict  between  first-order 
and  second-order  motion  information. 

The  cfTecl  of  the  power  threshold  c  in  the 
computation  of  DP  may  be  uridersiood  by 
comparing  3D  shape  performance,  in  the- con¬ 
trast  equated  (approximately  half-power  stan¬ 
dard)  and  1. 5:0.5  contras;  .alternation  stimuli. 
Without  tHe  power  threshold  c  enteriiig  into 
computed  DP,  the  contrast  alternation  1,5:0.5 
ccmpuicd  DP  is  only  slightly  higher  than  that 
for  the  half-intensity  standard,  while  identifica¬ 
tion  levels  arc  quite  diflerent.  However,  even 
with  c  =  0,  identification  performance  is 
monotone,  with  DP.  (DP  computations  with 
oO  and' with  c  =  0  arc  shown,  as  filled -.and. 
open  circles,  respectively,  on  the  abscissa  of 
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-Fig.  12.)  Hence,  the  3D  shape  data  are  consis- 
-tcni  with  a  DP  analysis  of.thc  outputs  from  a 
-first-order  (Fourier)' motion  system. 

Why frsi-order  motion  for  3D  shape  perception^ 
Firsteorder  (Fourier)  motion  systems  are  as- 
.sumed  .to  be  implemented  with  detectors  like 
those  Kliematized'in  Fig.-l.,^ond-order  (non- 
Fourier)  motion  systems  inay-implemeht  some 
form  of  nonlinear,  transformation  on  the  image 
intensities  prior  to  further  spatio-temporal  anal- 
iysis(see  Chubb  &,Sperling,  1987)..The  two  tasks 
in.which  secondordcr.infonriafion  could  not  be 
efficiently  utilized,  3D  shape  recovery  and  mo- 
..tioii' scgmcntalion,  require  infonnation  about 
motiqn'direcfion  (and  velpcity)  in  several  local 
regions  simultaneously.  Hence,  our  evidence 
agrees  with  the  evidence  of  Chubb  and  Sperling 
(1988a.  b.  1989a,  b)  that  the  noh-Fourier  mo¬ 
tion-systems  are  most  effective  at  large  spatial 
scales,  with  foveal  presentation,  and  do  not 
function  .well  in  noncentral  locations.  For  our. 
stimuli. -3D  structure  was  extracted  primarily 
from  first-order  motion  information. 

Our  stimuli  were  modestly  complex  but  con¬ 
tinuous-surfaces  in  depth.  The  surfaces  were 
depicted  by  randomly  scattered  and  lincon- 
,nccted  dots.  Object  transparency  (where  a  por¬ 
tion  of  the  stimulus  which  is  behind  a  nearer 
portion  of  the  surface  can  be  seen)  was  allowed, 
but  rarely  occurred.  (This  form  of  representa¬ 
tion  is  inosi  similar  to  defining  shape  by  local 
texture  elements  in  naturalistic  displays.)  Pre¬ 
cisely  what  the  boundary  conditions  are  on 
-these  findings  remains  to  be  determined,  Be¬ 
cause  our  dot  stimuli  arc  small,  sparse,  and 
hence  of  low  total  contrast  power,  they  may  be 
particularly  poor  stimuli  for  a  second-order 
motion  system,  Prazdny  (1986)  ■  reported  an 
example  of  3D  shape  from  second-order  motion 
stimuli  (which  do  hot  effectively  stimulate  first- 
order  mechanisms)  for  very  simple  (4  bend) 
wide  -wire  figures.  The  wires  were  depicted  by 
dense  random  dynamic  .noise  against  a  back¬ 
ground!  of  dense  static  noise  His  shapes  were 
very  simple,  nonsurface  shapes,  and  were  not 
edited  to  exclude  2D  information  about  iden- 
,ii;y.  However,  his  thick  wires  are  a  better  stim¬ 
ulus  (than  out  dots)  for  a  second-order  system 
due  to  the  large  spatial  scale. 

In  a  subsequent  paper  (Landy,  Sperling, 
Doshcr  &  Perkins,  1988).  we  examine  kinetic 
depth  stimuli  that  are  statistically  invisible  to 
Fourier  detectors.  We  use  various  different  stim¬ 
ulus  tokens  (dots,  disks,  wires)  and  backgrounds 
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We  tsuodace  2a  eisec^e  fssk  fix  oassniss  the  IdacM  depth  dlcct  (KOE). 

A  tipdJy  rocs:^  scrCice  coe^:^  cT  aad  oo  sa  ochcr*^  Cn  trooad  deSaed 
hjr  300  raadot^  poatjooed  iSou.  Oa  each  tnaL  f  cr53  shapes  was  preseated:  the  ofcsener's  task 
»2S  to  iiesuff  the  shape  sad  hs  metaa  direciaoa  of  tatataoa,  IdeatHicaiico  acccacy  »» la 
ctjectn-e  mea^^  «hli  a  lo*-  saesslaf  base  rate,  of  the  obsentxY  pgctptsal  abeSty  to  e\tiact 
30  suoctwe  tna  20  aiodoa  %ia  Kl^  (i)  Ct^ecthe  acew^  diu  «ere  coassteat  «itb 
previously  obcttacd  subgccthe  nuctjoipxsti  of  depth  aad  cobereQce:,(2)  Alo:^  »i!h  faction 
cues,  fctatiat  real  30  dos«defiaed  shapes  ioeiitably  peoduetd  a  cue  of  chaapog  dot  detahy.  By 
shocttctfig  dot  hledas  to  costrol  dot  dea^.  «c'  sbo»td  that  charpn^  deasHy  ^as  aehhcr 
ficcessao'  aor  sctScicat  to  acoouat  for  zeoiaey:  aoCoa  aSooc  nflVed.  (3)  Ocr  shaae  task  «i3S 
sdtahfe  uith  lao^  con  fron  the  6  raost  rrinaat  locatioss.  W'e  atraeted  the  dott  fron  these 
(^lk)cs  aad  tbeat  in  a  sifaptified  20  directioo42hdiss  taotion  task  «ith  6  perceptually 

flat  flow  fieSds.  SuS^ecis*  perfonaaacelo  the  2D  aad  30  taskswss  cquitaSeat.  iadicafin^  that  the 
ififtKination  procesaa^  capacity  of  KDE  H  not  unique:  (4)  Our  proposed  ssxucture'fro.*n>faocion 
algenthm  for  the  shape  task  first  finds  rdafise  lainuna  aad  matira  of  local  \elocity  aad  then 
assigns  30  depths  proportional  to  sdocity. 


In  1953.  Walhch  and  O'ConncU  described  a  depth  percept 
deris'ed  from  motion  cues  that  they  called  the  kinetic  depth 
effeet  (KDE).  Since  that  time,  there  has  been  a  great  deal  of 
research  on  the  KDE,  examining  the  effects  of  stimulus  pa* 
rameterssuch  asdot  numerosity  in  multidot  displays  (Brauns 
stein.  1962;  Crtcn,  1961),  frame  liming  (Petersik,  I9S0), 
occlusion  (Andersen  &  Braunstcin.  1933;  Proffitt.  Bertcnibal. 
&  Roberts.  1984).  the  detection  of  nonrigidity  in  the  three- 
dimensional  form  most  consistent  with  the  stimulus  (Todd. 
1982).  and  s'cridicalityofihe  percept  (Todd.  1984,  1985). 

Since  1979.  there  base  been  numerous  attempts  at  model¬ 
ing  how  obserxers  and  machines  could  derive  three-dimen¬ 
sional  (3D)  structure  from  two-dimensional  (2D)  motion  cues. 
Ullman  (1979)  referred  to  this  computational  task  as  the 
stnicture’from-motion  problem.  Ironically,  Ullman's  model 
and  most  ensuing  ones  do  not  explicitly  use  motion  cues. 
These  models  are  essentially  geometry  theorems  concerning 
the  minimal  number  of  points  and  view's  needed  to  specify 
the  shape  under  various  simplif)ing  constraints  such  as  as¬ 
sumed  object  rigidity  and  assumed  parallel  perspcctixe  (Ben¬ 
nett  &r  Hoffman.  1985;  Hoffman  &  Bennett,  1985;  Hoffman 
&  Flinchbaugh,  1982;  Ullman,  1979;*Webb  &  Aggarwal. 
1981).' From  the  geometric  models,  iterative  models  have 
been  developed  that  use  newly  arrived  position  data,  not  to 


I 

I 

derivie  the  true  structure,  but  to  improve  the  current  3D  [ 
representation  in  the  sense  of  maximizing  its  rigidity  (Landy.  j 

1987;  Ullman.  1984).  Only  a  few  models  actually  use  point  l 
velodty(le..an  optic  Dow  field)  in  addition  to  point  position  | 
(e.g.,  Oocksln.  1980;  Koenderink  &  van  Doom.  1986;  Lon-  I 
guet-Higpns&  Prazdny.  1980).  and  one  model  also  uses  point  1 

acceleration  (Hoffman,  1982).  I 

It  has  been  difficult  to  relate  models  of  the  KDE  to  the  i 
results  of  ps>cho!ogical  studies.  An  imponant  component  of  ^ 

the  problem  has  been  the  difficulty  of  finding  an  appropriate 
experimental  paradigm.  Many  KDE  experiments  have  used  | 
subjective  ratings  of  'depth"  or  "rigidity**  or  •‘coherence"  as  j 
the  responses  (sec  Dosher.  Landy,  &  Sperling.  1989,  for  a  | 
review).  Rdating  subjective  responses  to  a  process  model  of  > 
KDE  is  problematic.  Typically,  a  siruclurc-from-moiion  | 
model  yieldsa  shape  spedfivation.  To  link  the  derived  shape  j 
to  subjective  judgments,  and  thereby  to  experimental  results.  I 

a  decision-making  apparatus  to  predict  judgments  is  needed, 
and  this  may  be  quite  complex. 

Objective  Measurements  of  KDE:  Problems 

Because  the  ability  to  derive  structure  from  motion  presum¬ 
ably  evolved  to  solve  an  objective  environmental  problem  a 
better  approach  to  studying  KDE  is  to  measure  the  accuracy 


(I 

6 

Si 

ss 

n 

o 

r 

it 

f 

c 

t 

I 

< 

i 

> 

t 

f 


The  work  desenbed  in  this  article  w-as  supported  by  The  Office  of 
Naval  Research,  Grant  NOOO 1 4-85-K-0077,  tnd  by  the  U  S  Air  Force 
Life  Sciences  Direcioraie,  Visual  Information  Processing  Program 
Grams  85-0364  and  88-0140. 

Correspondence  concerning  this  article  should  be  addressed  to 
Geo^e  Sperling,  Ps>chology  Depann'ent,  New  York  Unherwty,  6 
Washington  Place,  Room  980,  New  York.  New  York  10003. 


of  the  KDE  in  an  objective  fashion.  Docs  the  observer  perceive 
the  correct  shape  in  a  display?  The  correct  depths^  The  correct 
depth  order?  The  correct  curvature?  Some  of  the  studies  cited 
earlier  attempted  to  answer  such  questions  by  using  objective 
response  criteria  (c.g.,  percentage  correct  in  a  one-  or  two- 
interval  forced-choicc  task).  Unfortunately,  in  almost  every 
case,  subjects  can  achieve  good  performance  on  the  task  by 
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ncjilecus^  pexcehtd  depcb  and  consdocsiy  or  aiKoascioen^^ 
fonzuibia^  their  rcspoasesoDtbebt0oroiberciie&  Id  these 
there  is  a  spple  soO'KI^  coe  sollieiecl  to  snake  the 
jtx^raent  accuratdy.  Ahhoi^  tbe.soljcct  may 'not  coo- 
scsocsly  be  these  amiktoa]  cues  to  make  correct  jods- 

isiesis.  «e  cansot  be  sure  of  the  basis  of  the  rcspotise  ustO 
the  ani&ctua]  cues  ha\«  beta  cSminated  cf  rcDdered  useless 
(c^  throat  UTde\ant  ^amtionX 

Let  us  coasder  some  example  L^ipsn,  Doner,  and  Konas 
(19S0}  pcesecled  sobjects  uitb  a  luO'fiame  represeotatioa  of 
dots  iandooi1>' pontsooed  on  the  suriacc  ofan  opaque  fotatii^ 
^)bere  di^sa)*^  by  pdar  pn^ection.  Oa  the  seco^  frame,  a 
yryiTi  pcrccpiagg  of  the  d^  mere  ddcled  and  retraced  «iih 
new  nndom  dots.  Subjects  uere  required  to  deterimne  uhtcfa 
of  tut>  such  tuoframe  had  a  higher  s^eaXto-ndse 

laiio  On  terms  of  dot  comspoodenees).  lappin  et  aL  (19S0) 
inieiT^eted  their  resuhs  in  terms  of  the  **ii^ma]  conditions 
for  the  visual  dctecUon  of  structure  and  motion  in  three 
d:mcn»ons~  (p.  7 1 7).  uhidi  is  the  title  of  their  article.  1  ndeed. 
the  signal  dots  represent  tut)  frames  ofa  rigid  rotating  sphere. 
Bul  subjects  do  not  need  to  correctly  percn^r  a  3D  sphere  in 
order  to  make  a  correa  response.  There  uas  no  anal)*^ 
orfered  of  hot^  far  a  3D  perception  could  diverge  from  ^bcf' 
ical  and  still  jicld  the  observed  accuracy  of  response.  Alter- 
naiivelv.  subjects  might  base  their  responses  on  perceived  2D 
flo^v  fields,  judpng  the  percental  of  dots  in  the  fim  frame 
that  have  corresponding  dots  in  the  second  frame.  This  2D 
judgment  need  not  use  the  entire  motion  flow  field.  For 
example,  the  5.6*  30  motion  of  the  ^hcre  corresponds  to  a 
small,  essential])  linear  translation  in  the  center  of  the  field. 
Disenminating  signaMo>noisc  ratios  in  translations  is  related 
to  Braddick’s  (1974)  •dmax*  procedures  for  discriminating 
perceived  linear  motion;  it  docs  not  necessarily  have  an)ihing 
to  do  with  KDE,  Thus,  although  Lappin  et  al.  used  response 
accurac)  as  their  dependent  variable,  the  subject's  ability  to 
estimate  a  signal'to-noise  ratio  ma)  have  been  artifactua!  and 
certain!)  is  not  easily  convened  into  an  estimate  of  the 
accur2C)‘ofKDE. 

Petersik  (1979,  1980)  represented  rotating  spheres  by  sur* 
face  elements  that  were  dots  or  small  vectors.  In  both  studies, 
the  spheres  wrre  displav  cd  with  polar  projection,  and  subjects 
were  required  to  discnmin?te  clock'wise  from  counterclock¬ 
wise  rotation.  A  possible  anifact  here  is  that  the  motion  of  a 
single  stimulus  element  provides  sufficient  information  to 
respond  correctly.  That  is,  under  polar  perspective,  sbmulus 
points  follow  elliptical  paths  in  the  image  plane.  To  determine 
rotation  direction,  the  subject  needs  only  determine  the  2D 
rotation  direction  of  a  single  point  (assuming  knowledge  of 
the  vertical  position  of  the  point  with  respect  to  c)c  level). 
Pcicrsik  made  the  task  'more  difficult  b)  adding  noise  to  some 
dot  paths,  by  varying  the  slant  ol  vector  elements  from  frame 
(0  frame,  or  by  varying  the  nunerosity.  However,  none  of 
these  manipulations  prevents  the  .ubjcct  from  using  a  purely 
2D,  non*Kl)E  strategy.  Indeed.  Braunsfein  (1977)  had  previ¬ 
ously  examined  precisely  this  point.  Braunstein  demonstrated 
that  only  the  vertical  component  of  the  polar  perspective 
transformation  was  used  by  subjects  for  a  depth-order  judg¬ 
ment,  and  that  this  component  was  sufficient. 


Audeisea  aod  Bouastdo  (1983)  also  used  discrinuoaiton 
of  routios  dirmioa  lo  evaluate  KI^  Their  disilays  repre^ 
sar^  dumps  dots  on  the  suriace  of  a  sphere.  A  dump 
was  CDSStnied  as  being  bounded  by  an  invisiHe  pentagon, 
whose  presence  was  made  known  the  fact  that,  when  h  lav¬ 
as  the  front  surface  of  the  ^ibere;  h  oedaded  dots  that  lay 
bdsnd  it  on  the  rear  smf^  These  spheres  were  displayed 
by  paraOd  perspective;  and  the  cue  to  d^Ji  order  (front,  rear) 
was  provided  ^  oedusion.  Again,  ahhot^  the  dependent 
vaiia^  was  response  accuracy,  a  sutgect  did  not  need  to 
pcreche  a  3D  oi^cct  to  detenmne  the  directioo  ofroiatlon^ 
the  subject  needed  only  toderemnsetbe  movement  direction 
of  the  continuously  visible  dumps. 

In  several  studies,  simi^  relative  vdodty  cues  are  an  that 
the  sut^ect  needs  to  perform  the  KDE  task.  Braunstdn  and 
Andenra  (1981)  di^yed  a  rouhidot  representation  of  a 
dibedia]  edge  tlut  moved  htvizontaDy.  The  dots  were  dis- 
I^y«)  using  polar  projection,  so  that  toiizontal  point  veloc> 
tiles  were  inversdy  proportional  lo  depth.  Thus,  the  di^lay 
contained  a  velodiy  gradient  that  dther  increased  or  de¬ 
creased  from  the  midline  of  the  display  to  the  upper  and 
lower  edges  of  the  dispby.  Subjects  judged  whether  a  given 
di^y  represented  a  convex  or  concave  edge.  In  this  task, 
cornering  the  reblive  velocity  of  points  in  the  center  and  at 
the  top  edge  of  the  dispby  is  all  that  is  necessary  to  perform 
accuratdy  (the  location  with  the  greater  velocity  is  judged 
“forward"). 

In  experiments  by  Todd,  subjects  determined  which  of  five 
curvatures  (Todd,  1984)  or  slants  (Todd.  1985)  were  depicted 
tn  a  multidot  dispby .  Again,  Todd  described  the  task  in  terms 
of  the  perceived  3D  object,  but  accurate  performance  is 
possil^e  by  comparing  the  relative  velocities  of  points  in  just 
two  areas  of  the  dispby. 

In  all  the  studies  just  cited,  the  subject  could  perfonn  the 
required  KDE  task  by  using  a  minimal  artifaelual  cue.  One 
postible  solution  to  the  problem  of  subjects  learning  to  use 
artifaelual  cues  is  to  withhold  fredback.  The  assumption  is 
that,  without  feedback,  the  subject  will  use  only  pcrceiv  ed  3D 
shape.  This  approach  has  been  used  extensively  by  Todd 
(1982, 1984, 1985)  Unfonuna'.rly.withholdmg  feedback  docs 
not  mean  that  the  subject  cannot  use  an  allemative  perceptual 
or  decition  strategy  to  supplement  judgments  of  perceived 
KDE  depth.  One  strategy  that  subjeas  often  adopt  without 
feedback  is  to  adjust  their  responses  so  as  to  respond  equally 
(or  nearly  equally)  often  with  each  of  the  possible  responses 
For  example,  Todd’s  (1984)  procedure  is  vulnerable  to  this 
artifact  of  strategy.  He  used  surface  dots  to  represent  cylinders 
with  five  diflereni  curvatures.  On  a  given  tnal,  subjects  judged 
which  of  the  five  curvatures  was  presented  As  an  alternative 
to  perceiving  KDE  depth,  a  subject  could  judge  the  apparent 
velocity  of  dots  in'  the  center  of  the  display  and  use  the 
knowledge  of  the  velocities  displayed  on  previous  tnals  to 
choose  a  curvature  category  Indeed,  subjects  arc  extremely 
good  at  estimating  the  mean  velocity  and  vanaiions  from  it 
in  a  sequence  of  displays  (McKee,  Silverman.  &.  Nakayama, 
1986).  Although  the  subjects*  use  of  a  trivial  strategy  that 
estimates  just  a  single  velocity  per  trial  may  not  explain  the 
entirety  of  Todd's  results,  it  predicts  the  nearly  veridical 
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characfsr  ^sut^ect  re^onsss  and  ihcrd^  coukS  account  for 
most  of  tbc  data. 

Objective  Measurement  of  KDE:  Proposed  Solution 

Tbe  KDE  is  a  perceptual  phenomenon  that  allo^^  sutgects 
to  pered^o  the  ^tive  dei^  of  dilfeFent  positions  in  visual 
spsee  and  hence  to  infer  the  shape  of  otje^  in  the  environ' 
ment.  In  all  of  the  experiments  have  discussed,  the  shapes, 
presented  were  v'ety  dmple  (^>beres.  cylinders,  and  planes), 
and  hence  sim;de  re^nse  stratepes  utnild  havi;  t^'dfeC' 
tiv*e:  None  of  the  experiment  discussdl  above  requires  the 
sut^'ea  to  use  a  perceived  '3D  shape  in  order  to  perform 
accurately.  In  all  of  the  studies  we  reviewed,  subjects  had  the 
opportunity  to  use  artilaaual  cues.  None  of  these  experiments 
presented  shapes  with  complexity  approaching  that  seen  in 
the  real  world  in  which  the  ability.to  compute  structure  from 
motion  evolved. 

In  this  article,  we  describe  a  new*  method  for  investigating 
KDL  Our  aim  is  to  provide,  instead  of  the  demonstration  of 
KDE  b>  means  of  perceptual  reports  (what  subjects  say  they 
see),  a  lest  of  perceptual  abilities  (what  complex  shape  prop- 
ernes  subjects  can  extraa  from  visual  How  Helds).  The  task  is 
shape  identiticaiion.  in  which  on  each  trial,  one  of  a  hrge 
lexicon  of  shapes  ts  presented.  Each  shape  contists  of  a  flat 
ground  wuh  zero,  one.  or  two  bumps  or  depressions.  The 
bumps  and  depressions  vary  in  petition.  2D  extent,  and 
orientation.  Because  of  the  way  the  lexicon  of  slrapes  is 
constructed,  good  performance  in  the  shape  idemifleation 
task  requires  simultaneous  local  computation  of  velocity  in 
man>  potitions  of  the  display  and  global  coordination  of  the 
local  information. 

Experiment  1:  Dot  Numerosity  and  Bump  Heights 

To  demonstrate  the  shape  identifleation  method  and  to 
investigate  its  limits,  wc  replicated  and  extended  one  of  the 
classic  findings  m  multidot  KDE.  the  dependence  of  quality 
ratings  (usually  combined  coherence  and  rigidity,  or  ’'good¬ 
ness*)  on  dot  numerosuy  (Braunstein.  1962.  Dosher  et  al, 
1989;  Green,  1961;Landy.Dosher,&Spcrltng.  1985).  Quality 
of  KDE  generally  has  been  found  to  increase  with  dot  nu- 
merosity.  We  investigated  the  eflects  of  dot  numerosity  and 
depth  extent  on  the  effectiveness  with  which  subjects  used  the 
KDE  to  identify  the  target  shape  from  among  its  many  close 
competitors. 

Me/fiod 

Sufy^ts.  Three  subjects  were  used  in  thestudy.  Two  were  authors 
of  this  article,  and  the  third  was  a  graduate  student  nawe  to  the 
purposes  of  the  expenmcni.  Two  subjects  had  normal  or  corrected' 
(o-normal  vision;  one  subject  (CFS)  had  vision  correctaUe  only  to 
2(y.i0. 

Di3p/a)S.  The  shapes  used  in  the  experiment  were  3D  surfaces 
consisting  of  zero,  one.  or  two  bumps  or  concavities  on  an  otherwise 
flat  ground.  Here  we  use  the  term  s/iapc  to  indicate  the  positions  of 
these  bumps  and  concavities  on  the  Hat  ground,  irrespective  of  other 
stimulus  parameters  that  were  varied,  includingbump  height,  number 


of  decs  used  la  represest  the  shape,  aad  roeatioo  i£rectioa.  The  shapes 
were  eooxuuctcd  as  fbOows  (see  Figure  Wiihia  a  sqrbre  area 
widi  sida  of  Ic^sh  X  a  cirde  whb  £ask-i£r  0.9s  was  cenrertd.  AS 
deprh  v^*jes  oolsade  ^riide  were  set  lo  KTO  (le,  to  the  ct^  base 
plLse.  whkh  a  the  iStiaS  dispby  was  the  same  as  the  is^  pb»). 

cadi  of  three  positions  is^  the  drds  (located  at  the  vertices  of 
^  eqaibtera]  triasgSeX  the  depth  was  spedfM  ascsther  (a  (&u»ce 

A  in  froot  of  the  o^eet  base  plane,  closer  to  the  observed.  0  (in  the 
ot^  base  fhaeX  or  -A  (b^nd  the  object  bsse'pb&eX  A  smooth 
^^w3scoesui:ctedLu$i^astaadardri>bic5pEaea^imihta.wh9di 
p»scd  through  the  flat  saroand  and  the  vertices  of  the  triaegk.  For 
a  gjvea  set  of  vertkes.  27  shapes  were  coostiueted  in  way  (see 
fBfbrsomecxam^jesX 

Two  <£flcmt  sets  of  vertices  wm  used  to  generate  shapes.  These 
were  either  at  the  comers  of  a  uiaegle  pointing  up'fdesignaud  u)  or 
of  a  trian^  pointing  down  (designated  dX  Shapes  «ete  denoted  bv 
indicating  the  trio  of  potitions  (uordX  and  then  specifying  for  each 
position^  the  order  shown  in  Figure  lA)wbether  that  petition  was 
in  front  the  trisect  base  plane  (4-X  in  the  plane  (OX  nr  behind  it 
("X  For  example,  the  shape  denoted  by  ti4—0  consists  of  a  bump  in 
the  tipper  centra!  area  of  the  display,  a  depression  in  the  lower  left  of 
the  shape;  and  a  flat  area  in  the  Io*Aer  right  of  the  shape  (see  Figure 
I6X  Note  that  i/OOO  and  ciDOO  both  designate  the  same  shape  a  flat 
square.  Fifty-three  distina  shapes  can  be  generated  m  this  manner. 

Oisg^ys  were  generated  for  all  combinations  of  the  53  shapes, 
three  dot  numerouties.  and  three  bump  heig.Sts.  For  the  flat  shape 
(dencMed  rjOOO  or  liDOOX  vatying  bump  bright  has  no  ctTect  a.'vd  so 
tbe^  are  only  three  flat  shape  dtspby  types  (corresponding  to  the 
three  numerositiesX  For  all  other  shapes  there  are  nine  dispby  types. 
This  results  in  471  display  types.  For  most  dispby  types,  a  tingle 
instantiation  was  ^ncrai^  (choosing  a  set  of  random  dots  and 
forming  a  dispby  after  rouiion  and  projection).  For  each  of  the 
dispby  types  for  the  flat  shape,  six  mstanibimns  were  made.  Thus, 
there  were  486  diflereni  dispbyx.  Bump  hnghi.  A.  was  0  5s.  0  I5s.  or 
0X)5s.  where  s  is  the  length  of  a  side  of  the  square  ground.  The  3D 
persp,,iive  drawings  of  the  shapes  in  Figure  IB  are  for  the  brgcsi 
Iwmp  heightv  Dot  numerotities  were  20,  SO,  and  320.  The  bump 
bright  and  dot  numerosity  manipubtions  are  illustrated  in  Figures 
IC and  ID.  respectively. 

Multidot  disjday^  of  these  shapes  were  generated  by  choosing  a 
random  sarnie  of  positions  on  each  surface,  rotating  the  resulting 
set  of  points  about  a  fixed  vertical  axis,  and  projecting  them  onto  an 
image  pbne  via  parallel  projection.  The  3D  motion  was  a  single  cycle 
of  a  sinusoidal  roiaiKo  about  a  fixed  vertical  axis  through  the  center 
of  the  object  base  pbne.  with  amplitude  of  25*  and  penod  of  30 
frames.  More  H>ccinc3l]y.  the  angle  at  which  the  base  pbne  was 
onenied  with  respect  to  the  image  plane  was  9(in)  *  ±25  sin(3rfn/ 
30)  degrees,  where  m  is  the  frame  number  within  the  30  frame 
diqiby. 

Two  rotation  directions  were  used,  indicated  as  /and  r,  correspond¬ 
ing  to  whether  the  left  or  nght  edge  of  the  dispby  came  forward 
initially.  Equivalently,  this  desenbed  the  tide  of  the  observer  to  which 
the  shape 'faced”  in  fli:  second  halfof  the  rotation  (which  was  usually 
an  eaticr  way  to  code  the  response).  For  an  /  rotation  (see  Figure  IE), 
the  object  initially  appeared  face-forward.  It  was  then  rotated  so  that 
ihc  front  moved  to  the  right  until  the  object  had  rotated  25*  Then  it 
reversed  direction  and  rotated  to  the  lefl  until  it  was  25*  to  the  left  of 
its  initial  oncntation.  Finally,  it  again  reversed  direction  and  rotated 
until  the  ground  pbne  was  again  perpendicubr  to  the  line  of  sight  A 
full  description  of  a  display  by  a  subject  included  the  indication  of 
the  set  of  vertices  («  or  d),  the  3D  depths  at  these  vertices  (+,-,0) 
and  the  direction  of  rotation  (/  or  rX  for  example,  {I't—O/ 

Because  of  the  parallel  projection,  simultaneous  reversal  of  depth 
signs  and  of  rotation  direction  yields  precisely  the  same  physical 
image  sequence  The  486  displays  desenbed  earlier  were  all  generated 
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Fifure  K  Stimulus  shapes,  rotations,  and  their  designations.  (Shapes  were  constructed  b>'  smoothly 
sph'ning  a  Hat  ground  and  three  points  that  were  either  toward  the  observer  (plus  sign),  in  the  Oat  ground 
(rero).  or  awa)  from  the  obsener  (minus  signj )  A:  These  three  points  were  at  the  comers  of  one  of  two 
possible  equilateral  triangles,  for  which  the  odd  point  is  up  (u)  or  the  odd  point  is  down  (d)  In  the 
esperimenl.  subjects  were  required  to  name  the  ^pe  and  rotation  direction  perceived. The  numbers 
specify  the  order  in  which  the  depth  «gns  of  the  three  points  ar*  to  be  reported.  B;  The  \anous 
comtnnations  result  in  a  lexicon  of  53  shapes;  tvpical  examples  are  Uusiraied  here  as  perspective  plots. 
The  orientation  of  these  plots  relative  to  the  viewing  direction  is  indicated  on  the  first  example, 

(Figure  conlinucs) 


with  the  I  rotation,  but  each  can  cquallv  well  be  desenbed  as  an  r 
rotation  of  the  sign*reverscd  shape  There  are  108  wajs  to  designate 
a  display  by  combining  an  up  or  down  shape*tvpe  with  a  bump. 
depress:on,  or  fiat  surface  at  three  dificrent  locations  with  a  left  or 
right  initial  direction  of  motion;  that  is  {d, «(  X  (+,  0)*  X  (/,  f|. 

For  most  shapes,  there  arc  two  equally  valid  ways  to  describe  the 
display.  For  example,  w+-0i  and  M-40r  describe  the  same  display. 
The  fiat  shape  is  denoted  equally  accurately  as  uOOO/.  uOCOr,  dDOOA 
and  <AX>0r.  Given  the  four  instantiations  of  the  fiat  shape,  chance 
performance  depends  on  subject  strategy.  Repeated  responses  of 
r/000/  (and  its  equivalents)  yields  a  guaranteed  performance  of  18  in 
4S6  correct  (or  2  in  54).  Random  guessing  yields  an  expected  per- 
formance  of  just  over  1  m  54  correct.  Subj^ts  did  not  designate 
bump  height  in  ihcir  responses.  Except  in  the  case  of  the  fiat  stimuli, 
bump  height  was  obvious. 

After  sampling,  rotation,  and  projection,  any  given  frame  of  the 
display  consisted  of  n  points  in  the  image  plancr  These  points  were 
displayed  as  bnght  dots  on  a  daik  hackground.''Thc  square  image 


extent  ofthe  displays  projected  to  a  182  x  182  pixel  area  subtending 
4*  of  visual  angle' The  displays  were  not  windowed  in  any  way,  so 
the  edges  of  the  display  osallated  in  and  out  with  the  rotation.  With 
the  25*  wiggle,  at  the  instants  when  rotation  reverses,  the  display  has 
shrunk  to  90%  of  its  initial  horirontal  extent. 

Displays  were  presented  on  a  background  that  was  uniformly  dark 
(approximately  0001  cd/m*)  Dots  were  single  pixels  of  approxi¬ 
mately  65  pcd  and  were  viewed  from  a  distance  of  i  6  m.  A  Inal 
sequence  consisted  ofacue/fixation  spot  presented  for  1  s.a  1-sblank 
interval,  and  the  2>s  stimulus  sequence  The  stimulus  sequence  was 
followed  by  a  blank  screen,  the  luminance  of  which  was  the  same  as 
the  background  of  the  stimulus.  The  display-was  run  at  60  Hz 
noninterlaced.  Each  display  frame  was  repeated  four  times,  for  an 
efiective  rate  of  15  new  frames  per  second  The  duration  of  each  30- 
frame  display  was  2  s. 

Apparatus  Stimuli  were  computed  in  advance  of  the  session  and 
stored  on  disk  The  stimuli  were  pro(^scd  for  display  bv  an  Adage 
RDS-3000  image  display  system  and  were  displayed  on  t  Conrac 


830 


SPERUNC.  LANOV.  DOSHER.  AND  PERKINS 


Figure  /  (eontinucd),  O,  Three  bump  heights  ^vcfe  used:  0.5^,  0J5j,  and  0  05s,  where  s  is  ihe  length  of 
3  side  of  the  square  base  of  the  shape.  The  shape  depicted  here  is  w+++.  D.  Three  dot  numerosiiies 
were  used:  20, 80.  and  320.  Pictured  arc  the  fint  frames  of  a  represenlaiive  display  in  each  numerosity 
condition.  E:  Two  ngid  rotation  motions  were  simulated.  Both  were  sinusoidal  rotations  about  a  vertical 
axis  through  the  center  of  the  object  ground.  The  object  either  first  rotated  to  face  the  subject’s  right, 
then  to  the  subject’s  left,  then  returned  face-forward  (/J.  or  in  the  opposite  direction  JrJ. 


72 1  lCi9  RGB  color  monitor.  The  stimuli  appeared  as  white  dots  on 
a  black  background. 

Vie^^ing  conditions.  Stimuli  were  viewed  monocularly  (wltli  the 
dominant  eye)  through  a  black-cloth  viewing  tunnel.  In  order  to 
minimize  absolute  distance  cues,  a  circular  aperture  slightly  larger 
than  the  square  display  area  restricted  the  field  of  view.  Stimuli  were 
viewed  from  a  distance  of  1.6  m.  After  each  stimulus  presentation, 
the  subject  typed  a  response  on  a  computer  terminal.  Room  iltumi- 
nation  was  dim.  (Illuminance  was  approximately  8  cd/ml) 

Procedure.  Subjects  were  shown  perspective  drawings  of  the 
shapes  (as  in  Rgure  IB)  and  were  instnicted  as  to  how  they  were 
constructed  and  named.  They  were  told  that  they  would  be  shown 
muliidoi  versions  of  these  shapes  and  would  be  required  to  name  the 
shape  displayed  and  its  rotation 'direction  as  accurately  as  possible. 
They  were  told  to  use  any  method  they  chose  to  remember  and  apply 
(he  shape  and  rotation  designations. 

Each  of  the  486  displays  was  viewed  once  by  each  subject.  The 
displays  were  pre^nted  in  a  mixed-list  de^'gn  in  four  sessions  of  45 
min  each  After  each  response,  the  possible  correct  responses  were 


listed  as  feedback.  For  each  stimulus,  there  were  always  two  responses 
that  were  scored  as  ^rrect  (pven  perceptual  reversals).  For  the  fiat 
.stimuli,  four  possible  answers  were  correct. 

To  become  familiar  with  the  task  and  the  method  of  response, 
each'^s.ibject  ran  trials  consisting  of  27  of  the  easiest  stimuli  (the  320 
dot  0.5s*height  stimuli)  Subjects  ran  until  accuracy  was  at  least 
85%  correvt  (approximately  100-1,* 


Results 

Accuracy  data.  All  subjects  reported  that  they  pereeived  a 
3D, surface  the  first  and  every  subsequent  time  they  vrewed 
'  the  high  numerosity  displays.  With  low  numcrosities,  the  dots 
'.were  perceived  in  approximately  their  correct  positions  in  3D 
space,  but  there  were  too  few  dots  to  give  the  illusion  of  a 
continuous  surface  or  to  disenmmate  unambiguously  between 
alternative  responses.  The  very  limited  practice  served  merely 
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to  teach  Oe  suliiecls  to  name  dK  pcRchnl  dopes  vhlioM 
having  to  icfer  to  deamiiits. 

ThemohsofEiperinieni  I  are  amhanzd  in  F«me  1. 
FjHi  fcnponre  vos  reared  as  coma  o^.if  boih  the  thape. 
and  the  mlalian  direciion  mere  eoirect^^  oonsislat  Ih^ 
if  irt-<y  mas  the  display,  respond  lefHV  and  mm 

oonecL  Eveiy  other  respoosc  was  ihconecLMhere  were  oc- 
casiond  responses  with  the  oonm  shape  and  the  inconccl 
maiioa  direclion  (M  such  enos,  4  (Of  aD  responses,  10% 

'  ofall  crrais).  Sotijccts  bier  indicaled  that  most  of  these  were 
a*  result  of  iopetting  the  directiM  of  rotation  before  the 
response  was  completed,  lather  than  fromatndymisrotatiiig 
pereept  Neveetbdess,  s^  responses  were  treated  as  iwnr- 
recL 

As  expected,  accuracy  improved  both  with  the  numerosit)' 
and  with  the  amount  of  dqpih  dispbyed.  There  were  agns  of 
a  ceibng  in  pcifonnanre  as  numerosity  increased.  For  two 


Figure  2.  PerfonnanK  on  the  shape  idemiticaifon  ush  as  number 
of  poinis  in  the  simulaied  shape  »as  varied.  (The  parameter  is  the 
height  of  the  butrips  relative  to  the  length  of  a  side.' Each  panel 
represents  data  from  a  different  subject.  Performance  increased  with 
both  nunierositv'arid  bump  height.) 


siAieas,'  fat  320  point  dispbys,  the  curves  crossed,  and  ^ 
tniiMbeaapedeptt  extent  (O.ISi)  was  as  good  or  better  tto 
dKlaa|e0.5sidcpthexieimAnama^sisofvauiancewascoiii- 
puied  treaiag  ninnerasity,  heigbl,  ^  sutgects  as  treatments, 
and  dMfiesAotationsasthe  experimental  units.  Both  oume- 
rosity  and degree  of  depth  iwrehiglilysignificam(p<.600I). 
"in*  f  ft  106)  -  1 193)  and  F(2, 106)  -  102.9,  icqrcctiveK'. 
Subjeas  rhlfcredtignilifaiitiyfiom  one  another,  f  (2, 106)  = 
33.5, p,<  Moot.  The  three*way  interaction  was  significant.  F 
(8, 424)  *  2.6,  p  <  3)1,  inrfieming  that  the  interaction  of 
heiglit  and  number  differed  amoiig  subjects  (see  Figure  2). 
No  twchway  interactions  were  significant. 

Error  malys^  A  'confinion  matrix  was  computed, 
pooled  acros  abjects,  the  nine  conditions,  two  roution 
diiections,  arid  two  passible  des^nations  of  each  shape  or 
depth  reversals  (h  was  thus  a  27  X  27  «  729  cell  matrix). 
Ti^  1  b  a  ammaiy  of  these  identification  errors.  Descrip¬ 
tions  are  given  for  seven  emnmon  error  types,  one  uncommon 
error  type  and  a  mbetUaneous  category.  If  a  bump  and  a 
depression  were  present  in  the  disphy,  and  only  one  of  the 
two  was  iitdicated  by  the  abject  this  was  called  a  missaJ 
feature  error.  If  the  bump  and  depression  arc  of  equal  extent 
on  the  base  plane  (e.g,  ir-b-0),  then  thb  was  called  a  missed 
equal  size  feature.  If  they  were  of  unequal  extent  and  the 
smaller  of  the  two  was  not  repotted,  this  was  categorized  as  a 
missed  smaller  feature.  Any  diaby  that  contained  only  one 
depth  sign  (such  as  ir+00)  and  was  reported  as  containing 
both  depth  signs  (e.g,  «0+-)  was  categorized  as  report  tuo 
depth  signs'nhen  there  was  only  one.  For  any  given  row  in 
the  table,  the  second  column  presents  examples  of  errors  of 
that  row  type.  The  third  column  lists  the  number  of  cells  in 
the  confusion  matrix  that  correspond  to  an  error  of  a  given 
t>pe,  and  the  fourth  column  provides  the  toul  number  of 
errois  that  occurred  over  all  cells  of  that  type.  The  last  column 
is  the  average  number  of  errors  per  cell  in  cells  of  that  type, 
computed  as  the  ratio  of  the  number  of  trials  indicated  in 
Column  4  divided  by  the  number  of  cells  in  Column  3.  In 
total,  there  were  586  errors;  divided  by  702  error  cells  this 
yields  0.83  errors  per  cell  on  the  average.  A  ratio  greater  than 
0  83  in  Column  5  of  Table  1  indicates  an  error  type  mote 
common  than  the  average,  a  smaller  number  indicates  a  less 
common  than  average  error  type. 

The  bottom  row  of  the  tabic  provides  summary  informa- 
non.  The  first  seven  error  types  listed  had  ratios  well  ov  er  this 
value  and  thus  were  more  common  than  other  errors.  The 
report  tnv  depth  signs  . . .  error  type  is  an  example  of  an 
exceedingly  uncommon  error. 

The  quantity  of  data  collected  w  as  not  sufficient  to  enable 
us  to  confidently  draw  many  specific  conclusions  from  the 
error  data.  The  hypothesis  that  errors  are  distributed  uni¬ 
formly  across  trie  tiine  error  classes  was  easily  rejected,  x’(8, 
A’  =  586)  “  1,032,  p  <  .001.  It  appears  that  four  types  of 
errors, were  the  most  prevalent.  Large  single  bumps  were 
■highly  confusable,  especially.the  subtle  difference  in  shape 
that  distinguishes  (/++-b  from  irt-b-f,  but  also  that  distin¬ 
guishes  between  (/+++  and  40++,  and  so  on.  Errors  yvere 
made  in  horizontal  location  of  the  shape  within  the  ground 
(e  g.,  rffiW  was  reported  as  being  «+00,  or  rf++0  as  rr+0+). 
Errors  were  also  made  in  judging  the  width  of  the  bumps 
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Stinmury  cf  IdemlfK^wn  EnS^  Pooltd  Ofer.Sidjeas,  Bump  Hei^us,  Dct  Densiiia,  Rotation  Directions; 
and  Depth  Reversab  '  ^  ^  ’ 


DacrviRm- 

Examples 

Numbd 
.  of^ 

'Numbd 
,  dferidrs- 

Rdio*.. 

Small  distortions  of  laife  bumps 
Incorrect  btunp  width,  coned 

ii-H-t-  iniadkaag^  with  d-¥-h¥ 

2- 

'  14.5. 

loofion 

lA**  inievdianged  with 

4  ■ 

'34 

Mhs^  smaller  fi^rd 

Di^bnal  bwopreportedas 

4+Hi^rep6rt«d  as  u4"tO  • 

6' 

30 

50 

•  huge  bump 

reported  as  n-l-M- dr 

8 

23 

2.9 

Missed  d)ual  size  feature 

>  u-f^' reported  as  «-H)0 

12 

29 

2.4 

Inconed  dugonal  bump  size> 

reported  as  n-HH- 

-8 

16 

20 

1.7; 

Small  horizontal  location  error 
Report  two  depth  ^ns  when 

u+CO  ioter^anged  with  rftHO.' 

16. 

27 

there  wfas  only  <me 

u-fOO  reported  as  u-h-0 

168 

40 

0.24 

Other  errors 

— 

'478 

358 

0.75 

All  errors 

— 

.  702 

586 

0.83 

“Total  number  of  ind>cat^  error  responses  diyiM  by  total  number  of  applicable  cells  (Column  .4/Column  3).  A  ratio  greater  jhan  083 
mdicates  a  t}pe  of  error  that  is  more  common  thin  average. 


(c.g.,  </+00  WM  reported  as  ’«0++).  (Anally,  for  .displays  for 
«r  hich  both  a  bump  and  a  concavity  v^ere  present,  occasiorully 
one  of  the^t«t»as  not  noticed.  It  is  interesting  to  note  that 
in  every  case  of  this  type  of  error  (the  missed  smaller  features 
and  mjssed.equal-siie  features  of.Table  1,  and  the  less  com¬ 
mon  missed  larger  features).  the.response>was.of<a  single 
bump  to\Aard  the  observer.  In  other  words,  in  the  presence  of 
a  perceived  convexity,  a  concavity  is  occasionally  missed,  but 
not  the  other  way  around.  On  the  other  hand,  when  only  one 
nonzero  depth  was  present  (a  single  bump  or  concavity),'  it 
was  very  rare' for  subjects  to  giva.a  response  containing 
multiple  depth  signs. 

When  the  confusion  matrix  was  broken  down  by  experi¬ 
mental  condition,  the  amount  of  data  was  rather  low.  Never¬ 
theless,  a  few  interesting  trends  were  evident.  First;  all  seven, 
common  errortypes  (the. first  seven  jows^of  Table  J),  re¬ 
mained  common' in  all  experimental  conditions.  As  the  task 
became^more. difficult, '-the '.types  of. errors  subjects  made 
remained  •’‘sensible.*'  Second,  .the  first;  two  error  types;  al¬ 
though  rommon  in  difficult  conditions  (low. height  or  low- 
numerosity),  becamc  even  raore-common  in  easier  condi¬ 
tions.  As  the  shape  impression  improved,  the  subjects  were 
able  to  eliminate  other  possible  shapes  and  then  were  more 
hkcly  to  err  by  chooung  the  most  similar  incorrect  shape.  The 
distinction  between  d+-f+'  and  tt++t  very  difficult  to 
make  even  when  the  perception  of  depth  was  quite  compelling 
and  well  sampled.  The  report  /wo  depth  signs . . .  error  t^ 
was  uncommon  in  b\[  randitions,  but  there  appeared  to  be  a 
trend  for  this  error  type  to  become  more  common  as  nume¬ 
rosity  increased. 

Experiment  2:  Texture  Density 

Several  cucs  may  lead  to  correct  shape  identification  in  the 
KDE  task.  One  cue  is  dynamic  changes  in  texture  density. 
The  shapes  are  generated  in  such  a  manner  that,  head-on  (i.e , 
viewed  with  the  obj^t  b^  plane  in  the  picture  plane),  the 
expected  local  dot  density  across  the  display  is  uniform.  By 
itself,  the  initial  frame  has  no  shape  inforrhation  whatsoever. 


As  the  shape  rotete^  areas  in  the  display  become  more  dense 
or  sparse, as  the  areas  in  the  shape  that  they  portray. become 
more  or  less  slanted  from  the  obsewr.  Theoretically,'  the- 
obrerver  could  use  this  cue  from  subs^uent  frames  after  the 
first'to  detcrmlne’the  shape.- Beca'ure  we  are-inter^ted  in 
structure  from  motion,  the  changing  texture  density  adds  a 
cue  in  addition  to  the  relative  motion  cue.Tn  Experiment  2, 
we  cbm^r^  three  conditions:  (a)  Both  the  motion  and 
density  cues  wre  present  w  before^  (b)  only  the  motion  cue 
was  present— dot  lifetimes  weire.yaned  in  such  a  way  as  to 
eliminate  the  density  cue  by  keeping  local  average  dot  density 
constant  across  the  display;  and  (c)  only  the  density  cue  was 
present— the  relative  motion  cue  was  eliminated  by  reducing 
dot  lifetimes  to  just  one  frame. 

Method- 

Subjects.  Three  subjects  were  used  in  the  study.  One  was  an 
author  of  this  article:  t^»'0  were  graduate  students  naive  to  the  purposes 
of  the  experiment.  Two  had  cor^ed-to-normal  vision;  one  subject 
((TFS)  bad  vision  correctable  only  to  20.40. 

Displays.  The. displays  were  generated  in  a  manner  similar  to 
Experiment  I.  The  same  lexicon  of  53  shapes  was  used.  The  flat 
ground  wrrounding  each  shape  win  extendi  horizontally  by  20% 
and  was  later  xyindowed  to  the  same  182  x  182  irixel,  4*  square,  so 
that  the  sides  of  the  dtspla>s  no  longer  oscillated  ^th  the  rotation. 
Instead,  points  appeared  and  disappeared  at  the  edges  of  the  window. 
For  each  shape,  an  instantiation  of  the  shape  was  made  with  10,000 
points  and  with  the  large  O.Ss-bump  height  of  Experiment  I .  Displays 
for  each  of  the  three  experimental  conditions  were  made  by  randomly 
subsampUngpoihu  from  this  routing  lO.OOO-dot  shape, 

Contr<d  ^ndition:  Majon  and  texture  cues.  The  control  condi¬ 
tion  had  both  the  relauVe  motion  and  changTng  texture  deririty  cues. 
A  small  random  subsample  of  points  was  chosen,  so  that  approxi¬ 
mately  320  poinu  were  visible  through  the  4*  square  window.  The 
subsample  points  was  rotated  and  projected  as  before,  and  then 
cli(^>ed  so  that  only  those  points  within  the  window  were  displayed. 
This  condition  was  identical  to  the  ^est  condition  of  Experiment  1 
(0.5a  320  dots)  except  for  the  windowing  (and  the  lower  dot  contrast 
desenbed  later).  Examples  of  the  density  cue  available  in  these  dis¬ 
plays  are  ^own  In  Hgure  3. 
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:0hMnlte  lOyOOQ)  that  wooUtfidi  appear  ii _ 

Ae  10^  lia^  ap  10  X  lb  din  obodiiiQB,  401  dea^ 
throobboiH'dre  diipliy<  Pote  were  dilelid  <jr  renrealed  aaly  ai 
aced^  to  ibe  donity  aiilfana.  AMwii#i  wrireinar  ia  intare 
deaaty  .were  aoikcalik  fai  ibe  coMiel  diabM  ^  CMboloa  of  ihe 
demity-ciie  did  are  aerioaely,  diiiiipi  ‘dte'feaireep'oaitiineVrf-  dre 
are^y  of  die  points:  More  powtt  reoiauied  do^yed  fcr  10  ftamre 
or  more' during  the  30  ftaoM  dt^y. 

The  aniMnt  of  sdotiQaUon  ,«as  null,  Tbe  averape  duape  (one 
half ofiotal  dot  additioiu  p^  ddetim)  Insyo  tin  Mm  10; 
for  300  dre  dtsfdajs  this  was  SJ3^  acMttatioa  (The  fii|lM  beiwtenh 
ftame  adiitiUation  was  1 3%:) 

Only  teicfure  dtnsay  cie.  The  relaUye  OMtion  cue.was  reawved 
in  tliis  cmditM  IreWof  Ow  duapint  Mture^deBahy^ow  hitacL  For 
ca^  frame  in  tte  33p^of  die  10,000  i^ts  wm^raado^y, 
chosen?  ThU  hiitpefi^  iade^r^dy  M  ow^'anplc.  lia^  suh|M 
to  the  ooortnuot  tMt  no  prwt  <^’a(9eiu^  ia  twiTwcoenvc  frames. 
Thus,  no  relative  motion  cure  were  available  la  there  djs^yi  which 
lookhd  tike 'dynamic  spbire  nmlotn ;'d^  n<M.  Cta  the  other  hM' 
bebiure  the  points^were  ^bsen  randomly  from  the  10,000  pmts, 
d^^  had  the  rente  expected  ^ure  density  as  thcsIO.000  poinu  da' 
each  frame,  and  indeed  becam  'niore'denae  aad  apm  ia  exaedy  the 
same  fiuhion  as  in  the  first  expertmentai  condition  (ai  iliustiatcd  in 
r>|ure‘3|. 

There  were  S3  possible  shapes  and  three  npertreenul  conditions, 
rreultini  in  159  display  typre.  Tvn  different  dtniays  were  made  of 
each  display  t)pc  of  the  flat  shape,  and  one  display  was  made  for  all 
other  displa>  t>-pre. There  were  thus  Ib2  displays.  They  were  displayed 
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Figure  3.  The  dyaamic  den^ty  cue;  (Thm  frames  are  shown  from 
a  di$(^y  corre^wndint  to  U404r.  a  bump  exlendinp  from  the  top 
center  to  the  iWr  ri^tMlte  upper  rovr  shows  frames  with  the 
denaiy  cue.  The  lowre  row'  lih^rates  the  eflectiveaess  of  removing 
the  dentiiycue  in  the  motion>only  condition.) 
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Da^aie.-  The  Rsuitt  aic  diowii  in  Fi|ttK4.  For  two 
-nljM  (MS.  ^  cliaiiiiitim  of  the  d^Vi^dem^. 

cue  iBil  M*  allCT  jjiaisnMi^  For.^  thi^  siilgM  (}Bl£ 
perfonnaiiadrop^fiom  SI.SK  to  68.5%  lifter  the  de^y 
oie  im'eI)miiuted..However,  h  was  not  clew  whether  this . 
sroin  pctfomuce  duiiv  m  due  to  the  ehminatiaD  oTthe' 
denB^;ciie;.itstf  dr'tlK'introdu(^n''or-scintyiatJon  (dot 
hoiKonapondchces).by.the  piocra  dreliminating  density 
ciia.  ^.two  (CI^  and  JBL),  the  elimination  of  the 

relaliw  mot^  ciie.in  the  density  only  conjition’dropped! 
petforminec  to.tevels  thnt^did  not'dii]^  signiliemtiyifiom 
chnnee.  For  the.thM  subject  (MSL),  peifdnnsnce  with>the 
dehsily  cue  •lonc.im  tigniflcahtly.nbove  chrnce,  although 
well  Mow  peiformah'ce  for  condilfons  in  which  the  relative 
motion  nm  was  anilahle. 

'lii'the  conAliph;in.  which  only  the  changing  dot  density 
cue  was  availaUc,  the  displays  did  not  look '5D.  The  only 
subject  (one  of  the  authon)  who  was  able  to  pcrfoim  signifi¬ 
cantly  ahow  clunce  in  this  condition  was  highly  familiar  with 
the  constru'etidn  of  the  displays,' For  any  given  ahape  anil' 
rotation  direction,  dumps  of  higher  density  appeared  fint  on 
one  side  of  the  display,  and  then  later  on  the' other  side,  as 
Ihe'ol^iwas  routed  an'equal  amount  in' both  directions 
from  the  initial  fa^^foiward  drienutioh  through  the  course 
of  the  30-ftame’display.'  Performance  was  a  matter  of  noting 
the  positions  in  the  display  at  which  high  density  occurred. 
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Figure  4.  Percentage  of  correct  shape^nd-rouiion  identifications 
for  the  ihree.cuc  condiu'ons  of  Experiment  2.  (Data  are  show  n  for  3 
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on  which  side  of  tlw  di^Iay  they  occurred  first,- and  the  2D- 
shape  of  the  texture  clump.'Ttien;  alr^ponse  ^  chosen  that 
most  consistent  with  this  information.  This  was  a  hi^ly- 
c<initive:t^l^  and  it  took' far  longer  to  mpond  iri^this 
condition  a  result; 

^an^hg  dot  density  neither  a  nece$»ry'nor  a  sufTi; 
cicnt  cue  for  the  perception  of  3p  shape  wTth  these  display^ 
However,  when  .the  density  (me  ^is  avaijable  wath  - motion 
cues,.thc  density  cue  may  have  ^n  Vsed  by  bne.of  ihrw 
subjects  to  siightlyJmprove  his  response&\When  the  density 
cue  was  the  only  cue,  another  one  of  three  subjects  was  able, 
to' improve/ his  response  accuracy  to  significantly vabbve 
chance.  These  results  point  out  the  importance  of  removing 
artifactual  cues  from  Idnetic  depth"  displa>^ 

Scintillaiion  cue.  In  the  ^nstanr'density  condition;  one 
mightargue  that  the  subject  ^  indirKtIy.  provided  vvith 
shape, information  by,  the  amount. of  ^intillatipn  (dot  nom 
correspondence)  in  different  areas  of  the  display.  Local  scin* 
tillation  could  potentially  \x  used  by  a  subject  Oust  as  density 
information  was  useful  to  one  of  three  subjects  in  the  density* 
only  condition^ 

The  relation  between  local  scintillation  in  these  displays 
arid  local  density  (and  thereby,  ultimately,  local  shape)  in  the 
control  displa>-s  is  not  simple,  Points  are  deleted  or  added 
only  when  necessary  to  krep  the  number  of  points  m  a  given 
locale  constant.  The  number  of  points  that  vvill  be  addccl  (or 
deleted)  is  thus  proportional  to  the  local  rate  of  change  of 
texture  density.  The  difficulty  in  computing  shape  from  scin* 
tillation  is  that  subjects  are  poor  at  judging  the  degree,  of 
scintillation  in  a  pattern,  other  than  differentiating  some 
scintillation  from  no  scintillation  (Lappin  et  al.,  1980).  And 
it  is  even  more  difficult  to  detehnine  whether  scintillation  is 
diie  to  points  being  added  or  to  points  being  subtracted,  that- 
IS,  to  determine  the  sign  of  the  change  of  texture  density. 

We  further*  investigated,  the  possibility' that  scintillation 
might  have  been  a  useful  cue.  in  an  informal  experiment. 
Vanous  amounts  of  irrelevant  scintillation  (in  the  . form  of 
fresh,  randomly,  occurring  dots  in  each  frame)  was  added  to 
all  areas  of  each  frame.  With  added  scintillation  that  was  10 
times  more  than  that  produced  by  the  density  removal  pro* 
gram.'the  quality  of  the  image  was  greatly  impaired.  But  the 
ability  to  discriminate  shapes' seemed  to  be  unimpaired.  This 
means  that  scintillation  is  relatively  unimportant:  Large 
amounts  do  not  greatly  impjur  the 'display;  small  amounts  are 
not  necessary  to  perceive  KDE  because,  when  they  are  masked 
by  la^e  amounts  of  scihtillation.'performance  hardly  suffers. 

In  displays  similar  to  those  of  Experiment  2,  restricting  dots 
to  have  lifetimes  of  only  3  frames  w-^  another  operation  that 
generated  large  amounts  of  scintillation.  KDE  identification 
performance  remained  high  even  though  the  amount  ofscin* 
tillation  was  large  and  varied  randomly  throughout  the  display 
and  from  frame  tb'frame  (Dosher;  Landy,  &  Sperling,  in 
press;  Landy,  Sperling,'  Doshcr,  &  Perkins,  1987).  All  in  all; 
the  difficulty  subjects  had  in  estimating  the  amount  ofscin* 
filiation  in  the  first  place  and  the  subsequent  difficulty  of  any 
computation  for  estimating  shape  from  scintillation  made  it 
unlikely  that  scintillation  played  a  significant  role.  We  con* 
elude  that  denslty*re]ated  shape  cues  are  eliminated  in  the 
moiion*only  displays. 


Experiment  3:'  Equivalent  2D  Task 

Because  of  the  large  s^  of  shapes,  the  systematic  way  in 
which  it  w^  constmeted,  and  the  large  set  of  possible  re* 
spbnsiK,  it  app^rs  difficuU  to  perform  accurately  in  this  task 
without  a  global  perception  of  shape.  Ind^,  except  in  the 
.  care  (rf'the  density-only  displays  of  Experiment  2,  all  of  our  • 
subj^  reported  pemiving  a  global  shape  and  basing  their 
response  on  this  ^oba!  shape  percept.^  Nevertheless,  one  of 
oiir  most  serious  bbje^ions  to  preWbus  studies  of  KDE  was 
that  thesubjects  rould  have  perform^  the  experimental  tasks 
without  a  glbl»l  perreption  of  sha^  by  using  minimal,  Inn- 
dental  cues.  ^ause  our  set,ojf  shap^  ^finite  (53  shapes), 
there  wererind^  potential-artifactuai  strate^'^;  however, 
b^ure  each  realization  of  a  shape  was  composed  of  differeht 
random  dots,  we  were  unable  to  discover  any  simple,  minimal 
coihpufation  for  our  task.  The  simplest  computation,  was 
equivalent  to  what  we  believe  the  KDE  computation  itself 
to  be. 

To  study  alternative  mental  computations  that  might  yield 
corre(^  responses  in  our  KDE  task,  we  developed  a  new 
display  that  did  not  prepuce  the  3D  depth  percept  of , KDE 
but  that  was  as  equivalent  as  po^ible  to  the  KDE  display  in 
other  respects.  To  perform  correctly  w  ith  the  new  display,  the 
subject  would  have  to  perform  a  computation  that  was  equiv¬ 
alent  to  the  KDE  compulation  except  in  that  it  is  performed 
by  some  other  perceptual/cogmlivc  process,  a  process  that  did 
not'yicid  perceptual  depth.  We  call  such  a  computation  a 
fCDE-ahcrmlh'CCompimion, 

Suppose  that  a  subject  chose  to  perform  the  shape  identifi¬ 
cation  task  by  measuring  Instantaneous  velocities  at  only  a 
small  number  of  spatial  positions  and  making  this  velocity 
determination  at  only  a  single  moment  dunng  the  motion 
sequence,  for  example,  a  moment  at  which  velocities  were  the 
greatest.  A  high  velocity  indicates  a  point  far.  forward  or  far 
behind  the  base  plane.  Opposite  velocities  indicate  points  at 
.opposite  depths  Using  there  simple  principles,  it  is  obvious 
that  velocity  measurements  at  six  positions,  the  corners  of 
both  triangles  used  in  specifying  the  shapes,  would  be  suffi¬ 
cient  to  identify,  the  shapes.  Fewer  mcasurerhents  of  velocity 
made  at  intermediate  points  would  suffice  for  identification 
of  our  restricted  ret  of  stimuli,  but  they  would  irivolye  un¬ 
realistically  complicated  computations  that  were  specific  to 
this  stimulus  ret. 

In  Experiment  3,  we  evaluated  a  computation  for  shape 
reconstruction  based  on  a  strategy  of  making  six  simultaneous 
local  velocity  measurements  at  the  points  that  corresponded 
to  the  possible  depth  extrema  in  our  stimulus  ret. 


Method 

Choosing  mi^ion  irajMortes/or  display  In  the  shape  identifica¬ 
tion  lask  (Figure  I),  suppose  one'were  to  track  a  single  point  on  the 
surface  of  the  shape  ihroughout  the  course  of  the  display.  Initially  the 
point  is  at  position  (r.  y,  z),  where  x  and  y  are  the  horizontal  and 
\edicat  image  plane  axes,  respectively,  and  2  is  the  depth  axis.  As  m 
Expenments  i  and  2,  assume  that  the  shape  is  retail  about  the  y 
axis  according  to  ^/n)  ■  ±25  sin(2rm/30),  where  m  is  the  frame 
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number.  Under  parallel  proj^ion,  Ac  moiion  path  of  the  pmnt  a 
purely  horizont^;  -- 

(^)]}' 

,»hOT  rm()^  +  2^'"  a;^  -  tan-'(r/jf)  de^ees 

ff  the  subj^  ^re  lo'a^ly  tire  local  motion  strategy  to  Uie  Aape, 
identificatioh  task,  Aey  wuld  need  to  measure  and  aiegoi^  .1^ 
velocity  for  ax  wch'  moUon  paths  amu|tanMusly.  In  Experiment  X 
the  subjects  ww  present^  di^y'wiA  aimiilj  Mn^ning  ax  njov- 
ing  itches  and  they  We  n^iie^  t<mt^orue  the  local  dii^ions 
of  motion.  ,  _ 

-  Displays.  Each  display  was  ba^  on  a  particular  shape  from  the 
Aape  identification  ^.  Each  of  the  ax  rnolloh  ^ihs  portray^  in 
Ae  display  W  Ims^  on  a  motion  paA  foiloWl  by  a  critical  ^nt 
on  Ac  surface  of  the  sKape.  as  just  described.  The  ax  critical  points' 
uW  Ac  proj«^6nsbnto  the  surface  of  the  ax  points  origiiially  used  -^ 
to'graerale  the  Aapes  (see  Figure  lA;  «'and  rf)rThc.motion  paths 
were  based  on  the  Aapes  with  the  larg«t  heights.(A  •  0.5i,  where  s 
IS  the  width  of  the  risible  background  plane).; 

The  di^Ia\s  were  intended  to  force  subjects  to  use  the  strategy  of’ 
simultaneously  measuring'ax  velochi^  without  any  possibility  of 
recourse  to  using  perceived  3D  shape.  Each  display  consist^  of  six 
patchttofro  iring  random  dots  (Figure  5)!^  The  dots  within  a  patch 
all  moved  with  the  same  velocity,  and  pSaichw  were  spatially  repa* 
rated,  so  that  there  was  no  pcrcepilbn  of  depth'.’  The  outline  squares 
of  Figure  5  were'not  direcUy  .risiblc  to  the  subject.  They  acted  as 
windows  through  which  planes  of  rnoving  randorh  dots  were  seen. 
Due  to  a  setup  errof,  dot  density  in  Experiment  3  was  slightly  less 
(0  83  of  rather  than  equal  to)  than  the.dcnsity  used  in  .the  constant 
density  condition  of  Ex^riment  2i  (This  density  difference  was  so 
small  that  it  went  unnoticed  at  the  time.) 

Hfspaiisc  mapping  There  were  two  rows  of  .three  patches  of 
moving  dots  Figure  5  indicates  the  correspondence  of  patch  position 
to  where  that  patch’s  liiotion  Is  risible  in  the  ori^'nal  shape  displays. 
Spatial  positions  in  Experiments  2  and  3  were  essentially  sirhiJar 
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Figures  Spatial  layout  of  the  stimuli  used  in  Experiment  3.  (The 
squar^  represent  windows  through  which  fields  of  moring  randorn 
dots  vvere  seen.  The  outline  of  the  windows  was  not  risible  to  the 
subject.  The  laW  under  each  window  denotes  the  position  in  the 
Aape,  as  in  Figure  lA,  that  controlled  the  motion  portrayed  in  that 
window.  For  example,  the  motion  path  of  all  the  random  dots  «€n 
in  the  upper  middle  window  w-as  the  same  as  that  taken  by  the  point 
in  a  Aape  display  ofExperimem  I  that  was  inUlallyalwve  position/ 
in  the  rftrian^c  Aowh  in  Figure  lA.) 


exc^  that  the  middle  portions  in  nch  row*  of  Experiment  2  displays 
interchange  to  emte  the  Experimebt  3  disF^ys.This  was  done 
in  order  to  make  the  response  ^er  fin  Ae  suhgects.  With  the  KD. 
^pe  displays,*  Ac  subjKt  dedded  wheA»  Ae'three  important  poinU* 
vrere  Acre  of  Ae  tr  or  triangle,  and  then  categona^  the  height  at 
each  <rf  Ae  Artt  comm  of  that  triangle.  In  the  coriespooding  m  otion 
task,  the'  subjed  Idkided  wh^^  the  lop  or  bottom  tovr  of  patch» 
wbs  m^  imporunCand  Ani  ^  mbtie'l^A  ol*  ^^h. 

patiAm^^t  rw.  ,  ' 

poinu  at  a  reasonable  Ad^'t  abow  the  plane,' the  2D 
motion  ^ih  w^  quarfsinusoidal.  That  is,  points  moved  to  Ae  left, 
th«i  to  Ac  right,  Ara  mumed  kftwanl  to  Adr  starting  position  (or 
fight,  Aen  left,  then  right).  Points  with  a  larger  inlUal  z  value  moved ' 
farier.-The  extreme  z  values  generat^  Ae  bigh^  speeds^  and  these 
alwa;^  lay  above  the  yenices  of  Ae  base  triangle  us^  to  generate  the 
Aape,  'nils  meant  that  subjects  could^solve  the  morion  mk  bV  jirii 
JuddRgvvhich  rowcontair^  Ae  faste^  ^peed,  and  Aen.  for  Aat  row, 
cat^orizing  Ae  morion  in  ttch  of  Ae  ArM  patches  ateut'halfway 
through  the  course  of  the  display  time.  Each  patch  w'u  to  be  tobeled 
as  moring  qulcUy  to  Ac  left  (/).  quickly  to  the  right  (r),  or  slowly .  if' 
at  all  (0).  Note  that  joints'  in  the' other  row  also  'moved  in  a  quasi* 
sinusoidal  manner,  but  more  slowly  than  the  rnaximum  speed  m  the 
relent  row^. 

One' possible  response  was,  for  example,  vlrO,  This  response  w  ould 
indicate  that  the  fastest  speeds  were  in  the  upper  row*,  the  upperdeft 
patch  ino\ed‘right,  then  left,  then  right.  Ac  uppcr«middle  patch 
movtdleft.  then  right  .  then  left,  and  the  uppcr*nght  patch  was  moving 
slowly.  There  were  54  powible  responses  (2  rows,  3  possible  rnbiion 
cat^ories  for  each  of  Ac  three  patches  in  Aat  row).  Because  uOOO 
and  <;^i^enoted  the  same  display  (one  in'  which  all  pai’chd  were 
movinVsIbwly)^  this  yielded  $3  distinct  display  types,  wirespondmg 
to  the '53  'distinct  shape*abd*roiation  display  types  In  the  shape* 
identification  experiment. 

'  There  vxere  $3  possible  shapes.  With  2  exemplars  of  the  flat  shape, 
and  1  for  all  other  shapes,  this  yirided  54  displays.  Motion  displays 
were  Asplayed  as  bnghi  white  dots  on  a  gray  background,  “nie  display 
backgroundluminancoas  15.6  cd/m^  Each  dot  added  an  additional 
24.3  jicd,  riewed  from  a  distance  of  1.6  m,  All  other  display  chara^ 
leristicswere  Ac  same  as  in  Experiment  1. 

ApparatuC  The  apparatus  wbs  Ae  same  as  in  Experiment  I, 
except  that  a  monochrome  U.S.  Pixel  PX15H315I.HS  monitor  with 
a  fast,  white  phoA.hor  was  used/ 

Vieviing  cortdiiions.  Stimuli  were  viewed  monocularly  with  gog* 
gles^acircular  aperture  restricted  Ac  field  of  view,  Lurninancc  outside 
Ae  aperture  was'  approximately  equal  to  the  background  luminance 
on  Ae  CRT,  which  was  15  6  cd/m*.  Stimuli  were  viewed  from  a 
distance  of  1.6  m.  After  each  stimulus  presentation,  the  subject  keyed 
reH>on$es  using  response  buttons,  and^visual  feedback  was  ^ven  on 
the  CRT.  The  room  was  dark,  but  h^l  adapution  level  was  con* 
trolled  by  the  CRT  background  and  the  illumination  of  the  occluding 
screen.  ’  .  ,  . 

Froc^ure.  A  block  of  trials  consisted  of  108  trials.  Each  of  the 
54  displays  was  viewed  twice  in  random  order.  For  the  bimuU  based 
on  the  flat  shape,  tw  o  possible  answers  were  correct  (uOOO  and  rfOOO) 
For  all  other  stimuli  only  one  answer  was  correct. 

Subjects  were  told  precisely  the  correct  strategy  to  use,  They  were 
told  that  they  would  see  «x  patches  of  moring  dots;  They  were  to 
determine  which  row  contained  the  patches  vriih  the  fastest  motion 
(cither  Ac  upper  row,  designated  u,  or  the  lower  row,  designated  d) 
For  that  row,  subjects  were  to  categorize  the  motion  in,  each  of  the 
Aree  patches  in  that  rovv’  as  measured  ateut  halfway  through  the 
course  of  ihe  display  time.  Each  patch  was  to  be  labeled  as  moving 
quickly  to  the  left  (/).  quickly  to  the  right  (r),  or  slowly  if  at  all  (0). 
After  each  response,  the  correct  answers  were  displayed  as  feedback 
Other  details  of  the  procedure  were  identical  to  Experiment  I. 
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Stdjects.^  T»05iitfecis»qeigcdMiJie$iiidy.O>e«ata>,wifcpr 
of  this  aii>^  W  «K  a  paAnfc  iMkM  niKe  to  dK 

the  experioKDL  Bodihad  conccioMo-aoniul  %tiioa.  SotiM  MSZ. 
no  a  sin^  block  of  tOt  mds.  Sobjcrt  JBL  laa  ibrae.biocks  cf  IM 
tnak. 

Results  -  - 

Sut^  MSL  scored  90.7%  comA  on  a  sncfe  Uock  of  108 
tiuls.  lo  the  three  blocks  oT.triab  nm  by  subfM  JBL,  the 
scores  were  583%,  75.9%,  and  88.0%,  lUpectHidy.  lod^' 
after  a  little'  practice,  perfbnnaiice  «as  quite  ftood.  equal  or 
slightly  bettff  than  pafonnaoce  in  the  e^est  condhiods  of 
&pmments  I  and  2.  wh>^  bad  a  compaiable  dot  density 
and  nnge  of  vetoddes. 

There  were  too  lew  trials  to  make  an  itHlepth  analysts  of 
error  data.  Howeser,  the  moa  fluent  motion  re^onse 
errors  ^rre^nd^  to  the  two  mok  fr^uent  KOE  errors  in 
Table  1  (small  distortions  or  mislo^izatiorttoflarge  bumps). 
For  example,  8  out  of  the  10  errors.made  MSL  were 
analogues  of  ihese.two  mor  typts.  Examjrfes:  id//,  a  triple 
‘'up”  bump  was  i^rted  as  d!lt,  a  triple  “down”  bump;  uD/ 
w-u  reported  as  dOiO;  a  double  bump  was  mstaken  for  a  sii^ 
bump  in  the  same  lotion  (sm  Figure.!).  Inde^  th^  results 
arc  not  surprising  because  the  Ncloftue  msolvoi  m  &peri> 
ments  1  and  3>were  similar.  It  seems' likely  .that  a  xory  large 
number  of  triaUxw'ouId  be  required  to  And  any  ^gnilkant 
differences  in  the  error  patterns  in  Expenment  3  and  thox  in 
Etp^'ment  1. 

Discussion 

We  have  introduced  a  new  objecine  task  for  measunngthe 
perceptual  ejreahencss  of  the  Idnetic  depth  cfTc^:  shape 
identiHcatioti.  With  the  current  lexicon  ofshap^  it  measures 
whdhcr  the  subject  can  globally  determine  precisely  which 
areas  are  in  front  of  the  ground  and  which  areas  are  behind 
the  ground.  Wc  consider  here  some  possible  objections  to  and 
some  issues  raised  by  our  results. 

Cues  to  Structure  From  Motion:  Optic  Flow  or 
Interpoini  Distances? 

'  In  the  displays  of  Experiment  2,  in  which  dot  density  was 
controlled,  subjects  solved  the  shape  identiHcation  uslceven 
though  no  single  frame  contained  any  information  that  could 
have  been  used  to  infer  shape.  For  these  stimuli,  at  least  two 
frames  were  needed  to  infer  shape.  By  definition  then,  the 
only  possible  cues  were  motion  cues. 

There  are  at  least  two  possible  motion  cues  lo  depth:  optic 
flow  and  changing  interpoint  distances  in  the  displa)*s.  ITtat 
is,  subjects  could  be  deriving  shape  from  a  global  optic  flow 
field  (instantaneous  velocity  ve^or  measurements  across  the 
field)  or  from  measurement  of  inierpoint  distances  of  parlic> 
ular  dots  over  two  or  more  frames.  Models  of  the  KDE  have 
been  based  on  both  optic  flow  (Koenderink  &  van:Doom, 
1986)  and  on  interpomt  distances  (Hildreth  &  Crz>'wacz, 
1986;  Landy,  1987;  UiIman,^I984),  To  a  certain  extent,  it  is 
possible  to  differentiate  between  these  models  by  creating 


dbpii)5  a  wkkk  dots  hsec  fifeiiaia  of  0^  t«o  &MCS.  la 
smek  <£spb>s.  a  global  opoc  flow  fidd  is  atadiHr  (ahhoegii 
■misfy.  sod  3D  suwetmt  endd.  xa  priadjpfc;^l«  coogiaefd 
fioai  Ike  flow  fidd.  AlKxaadvcIy,  soaK  sabaet  of  Ae  pocMs 
heeo  Hod  id  co^paic  a  30  obiea  based  oa 
saiesp^  dtoaccs,  Howcicr,  ifie'iMCQfar  ol^  ckaages 
lapid^  htaase  uiMfia  M  fisUMS  al  petals  have  been,  ic- 
:  idaced  by  cataefy  aew  penis,  •ocofidlased  widii  xhdsc  the 
pieceiSflC  fiaoKS.  It  tans  oat  that  sabjects  are  qiB3C  stdepc  al 
rtrshagridfmififaiidatasfcwiihspchifejpfaysfPoabcrctal, 
ia  press;  Laady  ct  al»  1987).  Tlas^  aad  fclAd  resatasL  are 
taken  as  suoag  etideace  agiiatf  die  lolerpotai  rfisumfc 
modds(DosiierctaL.sBpvcs^Laad|ycf  aU  1987).TcigciS3er 
with  the  results  oftheprcsciiteapcriincat.  in  wiacb  cfaai^cg 
den^tfe&BiaaiMlasaBalierittlHcilnskai'csmoiioiiQov 
fields  as  tbe  nectaaiy  aad  suflicaeot  cue  for  KOE  tn  movi:^ 
doc  ifispl^s.  Whether  intcrpouit  rfistanccs  or  other  oobqo 
cues  are  over  perceptually  sa^ieat  remaLi  epea  questions. 

.Multiple  Facets  cf  the  KDE 

W'o  hav%  previoitsly  argued  (Doshcr  ct  aL  I9S9:  Landy  et 
al..  198^  that  measurement  of- the  full  eflbct  of  stimulus 
manrpdlauons  on  the  KOE  requto  sev-cral  subject  responses 
In  or^  to  describe  fully  the  richness  of  the  percept  These 
reH>onscsincluded  judgr^ts  ofcoherer.ee  (whether  the  mul^ 
udoi  sumulus  coheres  as  a  single  otQcci),  rigidity  (does  the 
object,  ^leicfa^  and  depth  extent  (what  is  the  amount  of 
(kpih  perceive^  These  diffcrcot  aspects  of  the  percept  are 
(^ttally  corrclafed,  but  they  caii  be  decoupled  by  suitable 
d}^}(ay  manipulations.  For  exampk,  with  some  sul^ects.  the 
addition  of  exaggerated  polar  perspective  to  a  di^lay  increases 
Ute-permved  depth  extent  even  as  it  decreases  perceived 
rigidity; 

In  the  current  experiments,  this  richness  of  the  KDE  percept 
was  not  explored.  We  measured  the  extent  to  wnich  the 
di^lay-uas  cflective  in  creating  a  global  sensation  of  depth, 
and  hence  supported  objeaive  shape  idemificaiion.  Other 
aspects  such  as  depth  extent  or  rigidity  were  not  measured. 
The  difference  between  the  three  dcpUi  conditions  was  tm* 
mediately  obvious  to  subjects,  and  increasing  the'depth  extent 
displayed  (within  certain  limits)  did  improve  performance, 
but  we  did  not  measure  perceived  depth  extent 

Althou^  perceived  rigidity  was  not  explidtly  measured, 
nonripd  percepts  were  ^nUneously  reported  by  subjects. 
One  particular  example  was  vdy  common.  Shapes  with  both 
bumps  and  concavities  (e.g.,  u4’+~)  were  occasionally  seen 
in  a  nonrigid  mode.  Rather  than  seeing  one  area  forward, 
another  one  back,  and  the  whole  thing  ri^dly  rotating,  olv 
servers  perceived  both  areas  as  being  in  front  of  the  object 
ground  and  rotating  in  opposite  directions  (this  percept  looks 
rather  like  a  mitten  with  the  thumb  and  finger  portions 
alternately  grasping  and  opening).  This  panlcular  nenri^d 
pcrcq)t  o^rred  most  often  when  the  number  of  dots  was 
large  and  the  deptn  extent  was  at  its  largest.  In  this  stimulus 
condition,  with  mixed-slgn  shapes,  it  is  clearly  tisible  that 
the  two  bumps  cross  (in  the  ri^d  mode,  one  sees  through  the 
bump  to  the  concavify  behind  it  when  they  cross).  This  is  an 
example  of  a  failure  of  the  “rigidity  hypothesis“  (Adcison, 
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ianiiiitmioa  alia  B  a  Veridical  3D  ■■upROrioa  Aa  b 
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yddBtaMicAiBP»B|Ka«Me»uUi.|iiBa;«.hoaranr- 
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Kdalioia  to  fieriousEnipincal  Studies 

WefcmdAalAapeiilninfirJrioaperib«BMacenicica«e» 
«iA  Ac  nmilKr  cf  dots  Ssfbati  aad  the  eaeiu  of  dciA 
poniastd.  NciAcr  of  Aese  icsi^  is  auipciaat.  The  aome- 
rosty  nsah  is  aa  cucesion  of  pioioas,  moic  aAtectnv, 
sneasores  of  Ac  depA  pocrived  n  smpk  KDE  displays 
(Bnsssaia.  1962:  Geesa.  1961).  Increase^  Ac  number  of 
dos  prenides  Ac  obsessrr  niA  moee  samples  of  Ac  nuxioo 
of  Ac  shape  pcKuastd.  locrearingdepA  extent  iacreases  Ac 
lasjc  of  s-clocilies  used.  BoA  matdpolauoas  iooease  Ac 
obsener's  spnal-lo-Doisc  laiio  in  Ac  task,  m  aludi  nc^ 
soacrcs  nay  be  boA  ettcinal  (such  as  poAioo  qisaniiaation 
in  Ac  di^ilay  and  spine  shape  sampHnt)  and  inlcmaL 

Il7un  is  Computed  in  KDE? 

Whbio  mcasufcment  error,  suljccts  perfonned  eqoany  ntll 
in  Ac  motion  judjment  task  of  Esperintnt  3  and  comparaHe 
KDE  tasks  of  Experiments  1  and  2,  Fufiher.'Ac  most  com¬ 
mon  confurion  error  uas  the  same  in  all  experiments.  And 
ttee  is  eseiy  reason  to  suppose  that,  if  more  data  utte 
aiaibblc,  the  less  common  errors  also  «ould  be  highly  cor¬ 
related.  In  brief,  w  base  succeeded  in  creating  pro  equiralent 
tasks  for  clasafying  stimuli  into  53  shape  categories:  One  is 
solsed  by  a  KDE  mechanism  that  yiel  A  a  perceis  ed  3D  shape, 
and  the  other  is  itlwd  b>‘  a  motion  perception  mechanism 
that  yields  a  pcrccired  pattern  of  2D  motions.  tkAat  does  Ais 
imply  about  Ae  mechanism  of  WE  and  about  the  technology 
of  KDE  experimematioit? 

Although  the  specific  nature  of  the  peiceptual  algorithm 
that  extracts  3D  stnicture  from  2D  motion  has  not  yet  been 
cstabliAed,  it  is  reasonable  to  expect  that  it  ultimately  »ill 
be.  WTiales-er  Ae  compuulion,  the  equisalcm  compuution 
could,  in  principle,  be  carried  out  by  some  other  system  that 
was  supplied'wA  Ae  same  raw  information,  in  this  instance, 
Ae  optical  flow  fields  In  Experiment  3, »«  demonstrated  that 
Ae  measurements  of  the  optic  flow  fields  at  six  points  proside 
suflicicnl  information  Ar  Ae  shape  categorization  task.  When 
the  optic  flow  at  these  locations  is  provided  to  oteen  ets  in  a 
response^ompatible  format,  they  can  use  this  optic  flow 
information  to  categorize  Ae  stimuli  in  perceived  2D  just  as 
eflicienily  as  when  they  categorize  KDE  stimuli  in  perceived 
3D.  Ikhat  is  special  ateui  extracting  structure  from  motion 
is  not  Ae  informational  capaoty  of  Ae  KDE  system,  but  the 
perceptual  capacity  for  extracting  Ac  relevant  information 
and  providing  it  perceptually  as  3D  depth. 


ftirexiacriags)ncAttfiMaKiiioB.'Acnlc<aaiafbr- 
■orioBSopric  lour- This  BIS  deawMosed  is  Esperiiiie«2 
^  uduch  Ae  aesaduri  aoaSoar  cues  vac  cimiaalcd)  aad  ^ 
cipcrimcacs  ia  vhidh  diBs  vicic  gnea  vixjBtna  fifcADCs  cf 
01^- tarn  (or  Acec)  faaKS  90  Am  conopoadeacc  cues  BCtm 
Bcakeaed  aad  only  opric  lour  cats  sanited  (Dgaher  ci  at,  ia 
picssilandyctaL.  19t7)L  The  idcnal  iilbnnrioa  in  our 
panicahrahapedBcriMiaarioaiaABAesctofloalvrtlociiy 
aaaiaa  aad  aaxiaia  SB  Ae  cpric  low  aad  Aev  appcmiaiale 
shape.  A  icasoaaMe  aaaatptioa.aboal  Ac  stiactare^fioai- 
laolieacoiapujiinaBAalAeperccpo^gy  aatomati- 
calylocalesAia  aiiiiaii  ^  niiaiiB».e  itianttlif  vtloniics. 
aad  naastaaBAfiainlopintciwd  depths.  (Refaiivevdodty 
has  leap  beta  icecgaiad  as  aa  exuondy  potM  depA  cue 
(e^.  HefaiAohz.  1910/1924.  |>.'2»SC  Kcpeis  A  Grab^ 
1979)  and  andoabtaly  is  a  oirical  componcm  of  KDE.) 
When  Ac  rdevaat  anas  of  <B>iicalflow  arc  coiacied  htsuad 
by  oor  daptay  processor  and  presented  to  Ac  satpect  as 
tsolated  patches.  Ae  sab)ca  is  stO  able  A  dassily  Ac  vdocily 
in  Ac  patches,  but  Ac  automatic  pertcpiual  coovetsion  of 
veloeity  iino  perceived  depA  is  inhibited.  KcveiAelcss,  the 
extracted  velocity  information  is  suflicient  A  enable  accurate 
ciassificauon  of  Ae  stimuli  when  a  respoosc-ctvmpatible  for¬ 
mat  is  made  availabir. 

{figure  6  illiBtratcs  Ac  processes  that  are  assumed  A  be 
involved  in  object  recognition  via  Ae  KDE.  From  the  stim¬ 
ulus.  Ae  subject  extracts  a  2D  veloeity  flow  field.  The  KDE 
is  the  process  whereby  3D  depA  values  are  extracted  from 
the  flow  field.  These  depA  values  ate  combined  wiA  other 
shape  and  contour  information  from  Ac  stimulus  to  yield  a 
3D  object  percept  which  then  forms  Ae  basis  for  Ae  suljen's 
response.  A  KDE^Itcmative  computation  is  one  that  uses 
the  same  stimulus  and  vdochy  flow  field,  but  drcuravenis 
Ae  KDE  computation  by  deriving  the  required  response 
A'rcctly  from  the  flow  field.  Experiment  3  demonstrated  that 
a  KDE-altcmativecomputation  would  be  posrible  in  principle 
if  the  suljcct  could  extract  Ae  velocities  at  Ae  six  most 
relevant  locations. 

In  transforming  flow-field  veloo'ty  into  perceived  depth. 
Aere  is  an  inherent  ambiguity  in  sign:  A  given  velocity  can 
equally  well  indicate  depth  toward  or  away  from  the  observer. 
This  ambiguity  is  inherent  in  the  optics  of  Ae  display  and 
reflected  in  our  scoring  procedure.  However,  Ae  paceptual 
system  tends  to  resolve  the  ambiguity  consistently  in  neaAy 
locations.  On  those  occarions  in  which  it  does  not  (eg.,  when 
it  interprets  leftward  motion  as  closer  in  one  display  area  and 
as  fonher  in  another),  the  display  appears  to  be  grossly  non- 
rigid,  The  likelihood  of  consistent  depth  imetpreuiion  has 
been  studied  by  GilUm  (1972.  1976)  and  probably  can  be 
modeled  by  locally  connected  cooperative^ompeliiion  net¬ 
works  (see  Spaling.  1981,  for  an  overview  of  cooperation- 
competition  in  binocular  vision  and  Williams  &  Phillip 
1987,  for  an  example  of  cooperation  in  motion  perception). 


'.DE-Ahcmatiyc  Compulations 

Itisusefuliodisiinguish  three  kindsof  computation 

;D£-allcmaliv'cs.  and  artifactual  non-KDE  “  ^  - 

he  KDE  compulation  is  an  automatic  pcrcep 
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Figurc6.  Fl^ch3nforKOE.lC£^*ahenuu%ie.aAdafii£Ktua)compuubons.(FrDmibesdinulus,ihe 
{cBof*isg  are  assuotd.io  be  computed  ta  scqumce  2D  xdocity  Oow  Held.  3D  depth  ^-aluo  (KDE 
compvtiboo].  a  3D,ot^  reproeniation  (uhkh  ta  this  tastance  happens  not  to  correspond  perfectl)'’ 
«ith  the  ot^  repicMted  by  the  stimulus),  and  the  reqmrcd  response  sequence.  The  KD&allentaiite 
compuuUon  computes  the  required  rc:v)OQse  sequence  directly  from  the  2D  optic  flow  without  an 
intemsedbte  sta^  of  perceived  3D  depth:  that  is.  it  simulates  the  KDE  compuution  tn  another  port  of 
the  brua.  An  aniCKtual  compuution  uses  incidental  samulus  cues  or  motion  cues  from  only  a  small 
pan  of  the  stimulus  to  arme  at  a  re^wnse.) 


tion  made,  in  the  of  our  stimuli,  on  s'clodty  flow  flclds. 
and  it  results  in  perra%ed  depth  (a  3D  percept)  at  those  sHsual 
field  locations  uhere  it  is  successful.  A  KDE-allcmatisY  com* 
putation  is  a  computation  on  >elocity  flow  fields  ^'milar  to 
the  KDE  computation  except  that  it  is  made  consdously  in 
some  other  part  of  the  brain;  It  results  in  a  knowledge  of  the 
correct  response  but  it  does  not  yield  perceived  depth:  The 
field  is  pcrceiv'^  as  flat  An  artifactual,  non^KDE  computa* 
tion  us^  an  incidental  property  of  the  display  to  compute  the 
correct  response  and  the  computation  may  be  quite  unrelated 
to  the  KDE  computation.  For  eiumple,  (he  v^arious  objective 
studies  of  KDE  that  we  conadered  in  the  beginning  of  this 
article  all  were  vulnerable  to  computations  that  used  only  a 
small  portion-^in  some  instances  only  the  movement  of  a 
single  dot*»of  the  stimulus  information  that  would  have  been 
required  by  a  KDE  computation. 

Of  the  five  studies  reviewed  in  the  be^nning  of  this  article, 
the  possible  artifactual  computations  involved  1  dot  (one 
study),  2  dots  (two  studies),  and  other  cues  (two  studies).  The 
■problem  is  purely  technic^;  the  possible  artifactual  compu* 
utions  are  quite  different  from  KDE  computatior^  There  is 
a  great  risk  of  admitting  an  artifactual  computation  when  the 
set  of  possible  stimuli  is  small  and  when  the  required  KDE 
vcompuuiion  itself  is  relatively  simple.  Even  though  subjects 
in  these  studies  may  have  perceived  KDE  depth,  a  simple  2D 
strategy  would  have  improved  response  accuracy.  Although 
some  of  these  procedures  could  have  been  improved,  we 


deemed  it  better,  from  the  outset,  to  use  a  large  set  of  stimuli 
that  can  be  identified  only  afler  a  reblivcly  elaborate  KDE 
computation.  What  distinguishes  the  present  task  from  prior 
tasks  is  (bat  they'  admitted  aitifaaual  computations  that  were 
shrmeutsto  the  correa  response;  the  present  altemaiive  com* 
putation  is  an  equivalent  computation  to  KDE 
\'\^th  re^)ect  to  KDE-equivalent  computations,  we  can  ask 
two  questions:  Do  they  ever  occur,  and  if  they  do.  how  can 
we  be  sure  that  they  do  not  always  occur?  To  demonstrate 
that  a  KDE*equivalehi  computation  can  occur  we  first  have 
to  know  what  the  KDE  computation  itself  is.  and  then  to 
perturb  the  stimulu'  so  that  the  automatic.KDE  computation 
cannot  occur.  In  our  experiment  (and  probably  more  gener* 
allyX  the  essential  KDE  compulation  is  the  discovery  of  local 
v^ocity  minima  and  maxima,  and  the  coiuistent  depth  label* 
ing  of  these  minima  and  rnaxima.  In  Experiment  3,  six 
stimulus  areas  around  the  velocity  extrema  were  extracted 
from  the  KDE  stimulus,  and  (in  order  to  avoid  the  automatic 
KDE  computation)  they  were  presented  as  isolated  squares. 
The  subjects  were  able  to  label  these  areas  consistently  with 
respect  to  velocity  (not  depth,  because  the  display  was  per* 
ceived  as  flat).  Thus,  subjects  performed  a  KDE*equivaIent 
task  by  means  of  a  KDE*eqmvalent  cornputatlon.  Further* 
mere,  the  pattern  oferrorsin  the  equivalent  task  corresponded 
to  the  previous  error  pattern  in  the  KDE  task.  Aiihou^  there 
are  necessarily  some  differences  between  the  KDE  stimuli 
and  the  alternative  stimuli,  our  strong  result  makes  it  clear 


meSTIFYISG  SHAKS  W  %DE 


839 


dot,  alone  «iib  s^CKliid  cooipattiioiis,  ibc 

KDMicmatowcoiniwakwiiglobccolUidatdip 
pnaneKKaperiBKaiSL  ^  ^  ^  ^ 

Afii£Kti0l  iuuifnitaiions  are  most  easily  Acnnunalca 
^om  KDE  compuuiikms  by  sunohis  janmet^ 

Stoauhtt  cats  ibat  nelbt  aqiport  ao  aili&ctiia]  eom^^ 

arc  removed,  masfc^  or  are  render^  iseks  by  urdkvMt 
variaoorL  If  se%xme  acenra^  simives.  we  Ime  iocreased 
coofidence  ihat  h  is  based  po  a  KDE  coaqpotatioii. 

KDE  and  KDE-ahcmathe  computaiioiis  use  tbe-same- 
jtwwMhK  attiibiitcs:  they  difier'ia  where  io  thc  braio  the 
compuiadoD  is  made. -Two 'took  for  disatiiuiiaui^  between 
thfy  computations  are  introspMion  and  dm!  tasks.  For 
example,  all  sut^ec^  without  conscious  eflbrt.  immediately- 
perceive  our  KK  ^muH  as  s<4id  3D  objects.  Uliea  ‘subjeett 
honestly  xepofi  that  they  percd\^  3D 'depth  in  d>iiamic 
KDE  stimuli.  b>*  definhioo*  they  ha%e  performed  a  I^E 
computadon.Thc  proUem  is  that  KDE  may  not  be  ifie  only 
computation  bring  perf<Hined.  For  complex  stimuli  such  as 
OUTS,  howexer,  it  k  hard  to  imapne  that  a  subject  could  be 
perfoiming  a  useful  aliematne  compuUtion  without  aware¬ 
ness.  Indeed,  the  discoxerx*  of  an  altemaiixe  computation  for 
KDE  is  the  structurc-from-motion  problem,  and  the  solution 
proposed  in  Experiment  3  may  be  the  first  workable  solution 
for  stimuli  of  this  tj-pri  It  would  be  remarkable  if  subjeas, 
cxen  sophisticated  subjects,  discoxered  the  solution  in  the 
course  of  xiewing  the  stimuli.  Still,  exen  in  .this  case/bul 
especially  with  simpler  stimuli,  it  would  be  beucr  to  use  a 
fonnal  procedure  to  exclude, alicmatixc  compuutions.  This 
requires,  for  example,  (a)  isolating  the  altcmatixc  computa¬ 
tion,  as  in  Experiment  3,-(b)  finding  a  concurrent  task  or 
amilar  manipubiion  that  sriectixely  interferes  with  the  alter- 
naiixc  compulation  relatixc  to  the  direct  KDE-com'puiaiion, 
and  (c)  using  the  modified  or  dual  tasks  with  the  ori^nal 
aimuli. 

An  aliemaiixe  KDE  computation  is  analogous  to  an  alter- 
naiixe  stereoptic  depth  computation  that  is'  carried  out  by 
monocularly  examining  the  left  and  right  members  of  a 
stereogram.  When  stimuli  arc  dcagned  to  take  adxantagc  of 
the  exquisite  scnsilixit>  of  stcreopsis,  an  altcmatixe  monocu¬ 
lar  computation  that  uses  remembered  disparities  is  not  fea¬ 
sible,  even  though  it  may  be  leamable  in  special  cases.  The 
same  is  undoubtedly  true  for  KDE  and  aliematix'c  KDE 
compulations:  For  complex  KDE  stimuli,  xicwed  briefiy,  the 
alternative  computation  is  simply  out  of  the  question.  How¬ 
ever,  the  problem  of  interpreting  experimental  results  has  npi 
been  alicmativc  KDE  computations  but  anifariual  non-KDE 
computations.  The  best  way  to  avoid  subsequent  problems  of 
interpretation  is  to  use  complex  stimuli,  like  the  53-shape 
stimulus  set  used  here,  that  are  matched  to  and  challenge  the 
ability  of  the  human  KDE  compulation. 

Summary  and  Conclusion 

A  new  shape  identification  task  for  measuring  KDE  per¬ 
formance  is  proposed.  With  its  lexicon  of  53  shapes,  accurate 
identification  requires  cither  an  accurate  3D  shape  percept  or 
a  KDE-altcmativc  computation  based  on  simultaneous  mea¬ 
surements  of  2D  velocity  in  six  positions  of  the  display. 


Paferroance  m  ihc  shape  ideiiiificaikm  task  improx^  w^ 
lAU^eased  samefority  in  a  cauhidoc  dt^ilay  ainS  with  an 
•  ncrease  in  the  aimm  of  depth  portrayed.  Stupe  fakniifica- 
'tioo  was  SOI  mediated  by  ioddenu!  icxture-denrity  cues  but- 
nteby  inoim  coesdem^  from  o^flow^  The  ot^xe 
stepe  identifiuiioii  task  is  proposed  as  a  scDstixr  measure  of 
die  crilied  aqpca  ofkxDCtic  dc^  perfonnanoe.  It  is  proposed ' 
that  she  stntctUfe-froiii-iDdtioD  algoriibn^iaed  by  sulyects  to 
iAve  the  KK  sl^  identificaiioo  task  mx  olves  finding  local 
2D  xdoc^  minima  and  maxima  and  ashling  depth  values 
16  these  locations  in  amsutent  proporfion  to  their  XTloriiics. 
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Correction  to  Driver  and  Baylis 

In  ihe  article,  "Movement  and  Visual  Attention:  The  Spotlight  Metaphor  Breaks  Down,”  by 
Jon  Driver  and  Gordon  C  Baylis  {Journal  of  Expcrimenial  Psycholrgy;  Human  Perception  and 
Performance,  1989,  Vol.  15,  No.  3, 448-t56),  the  display  duration's  were  incorrect  and  should 
be  doubled  to  give  the  correct  figures.  Each  display  frame  actually  lasted  40  ms.  Thus,  total 
display  duration  was  200  ms  in  Experiments  1, 3,  and  4  aoJ  wis  120  ms  In  Experiment  2. 
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Ratings  of  Kinetic  Depth  in  Multidot  Displays 


Barbara  A.  Dosher 

Columbia  Unh'crsit)* 

Michael  S.  Landy  and  George  Sperling 

New  York  Uni^^ty 


Sutjects  saw  kinetic  depth  dt9la)‘S  «ho$e  shape  (^mtc  or  cylinder)  defined  by*  luminous 
dots  distributed  randomly  on  the  surface  or  in  the  \dume  of  tte  object.  Subjects  rated  perceixed 
3-D  depth,  fipdity,  and  coherence.  De^ite  individua)  dlfTerences.  all  3  ratings  increased  with 
the  number  of  dots.  Dots  in  the  volume  yielded  ratings  equal  to  or  greater  than  surface  dots. 
Each  rating  s-aried  i»ith  3  of  4  factors  (shape,  distribution,  numerosity,  artd  perspective),  but  the 
ratings  other  between  (rials  or  beh^een  cortdidons  were  (^en  uncoirelaled.  (^ject  shape  affected 
rigidity  but  not  depth  ratings.  Veridically  perceived  polar  displays  had  slightly  lower  lipdity  but 
higher  depth  ratings  than  parallel  projection  displays.  (Reversed  polar  displays  were  always  grossly 
nonrigid.)  The  interaction  of  ratings  and  stimulus  parameters  requires  theones  and  experiments 
in  which  diffocnt  KDE  ratings  are  not  treated  interdun^bly. 


When  a  tw<HJimensional  (2-D)  projected  image  corre¬ 
sponds  to  a  three-dimensional  (3-0)  object  that  is  rotating, 
viewers  frequently  perceive  an  object  with  depth.  Because 
rotation  induces  apparent  3-D  depth  even  when  isolated  still 
views  of  the  object  fail  to  induce  percebed  depth,  the  phe¬ 
nomenon  is  called  the  kinciic  depth  ejfcet  or  KDE  (Wallach 
&  O'Connell,  1 953).  In  this  article,  experiments  are  discussed 
that  consider  the  perception  of  dot  displays,  in  which  each 
stimulus  consists  of  illuminated  dots  on  an  otherwise  invisible 
object  It  will  be  demonstrated  that  there  are  a  number  of 
partially  decoupled  aspects  to  the  perception  of  these  displays 
^undcr  motion,  (a)  Coherence,  w'hciher  all  dots  in  the  display 
are  seen  as  constituting  a  single  object,  (b)  depth,  the  amount 
of  3-D  depth  seen  in  the  display,  (c)  rigidity,  whether  those 
illuminated  dots  that  are  perceived  as  constituting  a  coherent 
object  also  are  perceived  as  maintaining  their  relative  3-D 
positions  (ngid  appearance)  or  as  changing  their  relative  po¬ 
sitions  (nonrigid,  rubbery  appearance). 

There  is  a  large  body  of  literature  examining  the  function 
of  various  kinds  of  stimulus  vanables  in  the  kinetic  depth 
effect  Some  of  the  classic  stimulus  vanables  include  the 
number  of  elements  defining  the  stimulus,  element  shape, 
occlurion,  pcfspeclivc,  correspondence,  clement  density,  and 
rotation  speed.  The  effects  of  these  stimulus  vanables  were 
examined  by  a  yarieiy  of  dependent  measures,  global  “good¬ 
ness"  judgments  (Andersen  &  Braunstein,  1983,  Braunstein, 
1962;  Braunstein  &  Andersen,  1984,  Green,  1961,  Pciersik, 
1980),  qualitative  motion  categorization  (surface,  rotary,  os- 
dllatory;  Caclli,  197^.  1980),  judgments  about  objective  ro- 
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talion  direction  (Braunstein.  1962,  1977,  Petersik,  1979, 
1980),  perceived  curvature  or  shape  (Braunstein  &  Andersen, 
1984,  Todd,  1984),  and  proportion  of  corresponding  elements 
across  frames  (Lappin,  Doner,  &  Kotlas,  1980).  One  question 
that  arises  is  whether  the  choice  of  dependent  measure  is  of 
no  consequence.  Arc  ail  these  measures  simply  reflections  of 
a  unitary  aspect  of  the  kinetic  depth  percept?  In  order  to 
answer  this  question,  wc  examined  the  independence  of  the 
three  aspects  of  the  percept  listed  above  by  collecting  three 
separate  responses  on  every  trial  in  cxpenmcnis  that  vaned 
some  important  stimulus  v’ariables. 

As  a  roncretc  example,  consider  an  early  set  of  expenments 
by  Green  (1961).  He  examined,  among  other  factors,  the 
importance  of  the  number  of  stimulus  elements  on  the  KDE. 
Subjects  wen  asked  to  rate  displays  on  a  scale  that  combined 
the  notions  of  rigidity  and  coherence,  as  defined  here.  The 
label  goodness  is  used  here  to  desenbe  Green’s  combined 
rating  scale,  in  order  to  distinguish  it  from  our  use  of  the 
distinct  labels  rigidity  and  coherence  (which  Green  used  in¬ 
terchangeably).  Green  demonstrated  that  the  number  of  stim¬ 
ulus  elements  was  a  potent  factor  in  determining  the  goodness 
of  a  perceived  object  under  vanous  forms  of  rotation,  gener¬ 
ally,  the  more  stimulus  elements,  the  higher  the  rated  good¬ 
ness,  with  the  largest  increments  occumng  with  the  number 
of  elements  under  32.  In  pnnaple,  the  increment  in  goodness 
could  have  refiected  some  unspecified  weighting  ofcoherencx 
and  ngidity.  It  is  not  clear  whether  numerosity  affected  one 
or  both  of  these  aspects  of  the  percept  pnmanly,  nor  is  it  clear 
how  it  affected  perceived  depth  of  clement  trajectoncs. 

Here  we  investigated  element  numerosity,  as  well  as  a 
number  of  other  factors  that  may  vary  in  viewing  2-D  pro¬ 
jected  images  of  objects.  In  particular,  we  examined  one  image 
projection  factor  (parallel  projection  versus  perspective  pro¬ 
jection,  with  projection  distance  at  three  times  object  radius) 
and  three  object  factors  (the  number  of  elements  representing 
an  object,  from  4  to  80  elements;  whether  the  elements 
representing  the  object  were  entirely  on  the  surface  or  disinb- 
ulcd  throughout  the  volume,  and  the  strength  of  density  cues 
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to  depth  in  a  still  frame,  by  tiang  different  forms).  Our  aim 
u-as  to  determine  the  eflccl  of  these  KDE  stimulus  \anables 
on  ratings  of  coherence,  depth,  and  n^jty. 

Our  results  corroborate  some  findings  of  previous  in^-csti^ 
gators,  for  example,  that  the  number  of  dots  in  the  object  and 
the  presence  of  polar  perspective  can  add  to  the  strength  of 
KDE  Hov\e\*er,  y-T  also  ihw  that  these  'stimulus  variables 
do  not  generally  affea  all  three  aspects  of  the  KDE  percept 
equally  and  that  there  are  many  subtleties  and  complexities 
in  the  KDE 
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Method 

Because  of  the  large  number  of  stimulus  s'lrubles,  the  study  was 
divided  into  three  sq)arate  experiments.  The  experiments  were  con¬ 
ducted  with  the  same  sutjects  and  with  the  same  procedures,  except 
as  noted. 


Subjects 


#  •• 
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There  were  4  subjects  in  the  ex  penmen  is,  including  2  of  the  authors 
of  this  article  and  2  students.  The  students  were  paid  for  their 
participation.  Three  subjects  had  normal  or  cofrccted-tonormal  \i- 
sion;  subject  CFS  could  be  corrected  only  to  20  40. 

Apparatus 

All  stimuli  were  computer  generated,  and  the  di^lay  and  re^nse 
collecuon  was  computer  controlled.  Experiment  1  and  a  pilot  exper¬ 
iment  used  a  poini/vector  display  controller  (Kropfl,  1975)  and  an 
HPI304A  display  monitor.  Display  resolution  was  1024  x  1024 
pixelv  Expenments  2  and  3  used  a  raster  display  controller,  Adage 
RDS'3000,  and  a  Conrac  72I1CI9  RGB  color  monitor.  Display 
resolution  was  512x512  pixels.  Expenment  I  used  binocular  viewing 
in  a  completely  dark  room,  In  Expenments  2  and  3.  subjects  viewied 
the  stimuli  monocalarly  through  a  reduction  tube,  with  an  aperture 
slightly  larger  than  the  stimuli.  Hence,  weak  stereo  cues  to  flatness 
may  hax-e  been  present  in  Expenment  1,  but  not  in  Expenments  2 
and  3. 

Stimuli 

Stimuli  consisted  of  random  white  dots  scattered  on  the  surface  or 
throughout  the  volume  of  invisible  spheres  and  cylinders.  The  prob¬ 
ability  distribution  used  for  dot  placement  was  uniform  across  the 
surface  (or  through  the  volume)  in  each  care,  but  choices  of  dots  were 
constrained  so  as  to  fill  the  surface  or  volume  fairly  evenly  by 
partitioning  into  equal-area  (or  equal-volume)  segments  and  putting 
equal  numlers  of  dots  in  each  segment.  Rve  stimulus  parameters 
were  varied  First,  there  w-erc  two  types  of  objects,  a  ^ihere  of  diameter 
2*  of  visual  angle  and  an  upright  cylinder  of  height  2*  of  visual  angle 
and  cross-sectional  diameter  2*  of  visual  angle.  The  number  of  dots 
was  varied  from  4  to  80.  These  dots  were  either  positioned  on  the 
surface  or  in  the  volume  of  the  object  being  simulated.  Stimuli  were 
either  presented  in  parallel  projection  (i  e.,  with  no  perspective)  or 
with  an  exaggerated  amount  of  polar  perspective  (corre^ndingtoa 
viewing  distance  of  three  times  the  object  radius,  far  ampler  than  the 
actual  viewing  distance).  All  stimuli  were  rotated  about  a  vertical  axis 
through  the  center  of  therimulated  object.  Stimuli  were  either  rotating 
front-left  or  front-right,  although  this  distinction  is  only  meaningful 
for  the  stimuli  with  polar  perspective.  Single-frame  view-s  of  some 
sample  stimuli  are  showTi  in  Figure  1. 


Figure  I.  Single  frames  from  some  sample  stimuli  varying  in  nu- 
mcrority,  distribution,  and  form. 


Procedure 

On  each  trial,  the  subject  was  shown  a  fixation  target,  which  then 
disappeared  and  was  followed  shortly  by  one  rotation  of  the  stimulus 
(See  Table  1  fo  deuils  of  rotation  spKds,  etc )  After  the  stimulus 
presentation  was  complete  (approximately  2  s),  four  responses  were 
required  of  the  subject.  First,  the  subject  indicated  the  direction  of 
rotation  of  the  obje«  (front-lcfi  or  froni-nght)  These  responses  were 
used  m  polar  projection  displays  to  determine  whether  the  subject 
perceived  the  object  in  the  vendical  or  the  reversed  mode  Then, 
three  different  ratings  of  the  percept  w  ere  required,  depth,  coherence, 
and  rigidity. 

Depth  rating  The  subje«  indicated  the  amount  of  depth  per¬ 
ceived  in  the  stimulus  on  a  scale  from  I  to  S.  Given  that  all  stimuli 
WYTC  based  on  objects  rotating  about  a  vertical  axis,  depth  was  related 
to  an  inferred  “lop  view”  of  the  stimulus  The  subject  was  shown  the 
top  views  (Hg;re  2)  to  facilitate  his  or  her  rating  The  most  depth,  5, 
was  associated  with  a  perceived  circuUr  path  for  each  dot;  the  least 
depth,  I,  was  associated  vriih  no  perceived  depth  and  hence  an 
oscillatory’  linear  path  for  each  dot. 

Cohaence  rating.  The  next  rating,  also  on  a  scale  of  1  to  5,  was 
of  the  perceived  coherence  of  the  muii.  jot  display.  A  rating  of  5 
indicated  the  greatest  coherence  (i  e.,  all  the  dots  held  together  as  one 
object).  A  rating  of  4  indicated  that  a  few  dots  did  not  cohere,  3 
indicated  that  the  display  broke  up  into  two  diSiinct  objects  (segmen¬ 
tation).  2  indicated  that  three  or  more  objects  were  perceived,  1 
indicated  there  was  no  perceived  coherence  whatsoever, 

Rigidity  toting.  Perceived  ngidity  was  rated  on  a  scale  from  1  to 
5.  with  a  5  indicating  one  or  more  totally  rigid  objects,  and  lower 
numbers  indicating  more  and  more  nonngidity  or  “rubbenness " 


figure  2  The  inferred  top  views  of  the  stimuli  that  were  used  to 
define  the  five  levels  of  perceived  depth  ratings 
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Subjects  judged  all  three  aspects  of  each  percept  Tliis  a!!o»-ed  us 
to  relate  the  three  aspects  on  an  individual  trial  basis.  Had  the  three 
judgments  been  collected  sei^tely,  the  relation  betueen  the  three 
judgments  upuld  hav-e  been  a\*ailable  only  at  the  level  of  the  mean 

Designs 

Experiments  f,  2,  and  3  dilTeredTn  the  factors  \ari^  and  tn  the 
display  devices  used  Table  1  surmariaes  the  design  and  viev^ing 
conditions  for  each  experiment  Alt  deugns  viere  fully  crossed  in  each 
included  factor.  Left  and  right  veridical  rotation  direction  was  also  a- 
factor  in  each  experiment  Stimulus  order ,was  randomized  vnthin 
block  and  each  block  conusted  of  one  token  of  each  stimulus  typ^ 
yielding  64  stimuli  p^  block  in  Expenment  I  and  32  stirnuli  per 
block  in  Experiments  2  and  3.  A  different  random  token  of  each 
stimulus  type  was  generated  for  each  of  blocks  per  ex^riment  A 
pilot  expenment  yielded  no  effect  on  any  response  ratings  of  object 
iize  <2*  of  visual  angle  vs.  4*  of  visual  angle),  2*  of  visual  angle  vs^ 
used  subsequently.  The  polar  perspective  manipulation  was  defined 
with  mpect  to  obj'ect  radii,  so  that  the  object  size  manipulation  also 
varied  the  mismatch  between  actual  and  appropriate  viewingdistance 
for  the  degree  of  perspectiv-e.  This  and  some  of  the  current  work  was 
ongnally  reported  in  Land/,  Dosher,  and  Sperling  (1986). 

Results 

The  results  for  Ex^nments  1  and  2,  pooled  across  subjects, 
are  shown  in  Figure  3,  There  were ‘significant  individual 
differences,  discus^  below,  and  so  statistical  analyses  were 
performed  as  within*subject  analyses  of  variance  (anovas). 
The  12  tnahtype  replications  that  resulted  from  collapsing 
over  rotation  direction  and  test  block  formed  the  random 
factor.  These  replications  represented  responses  toT 2  distinct, 
randomly  generated  stimuli  of  each  type.  Table  2  lists  the 
significance  levels  associated  with  each  rating,  factor,  subject, 

-  and  expenment,  along  wnh  a  qualitative  coding  of  the  direc> 
non  of  the  results.  Table  2  thus  gives  a  quick  summary  of  the 
consistency  both  between  subjects  and  within  subjects  across 
expenments.  Table  3  summanzes  the  results  of  previous 
related  expenments.  Notice  that  the  current  set  of  experiments 
include  factors  and  ratings  that  are  either  unrepresented  in 
the  literature  or  represented  by  a  questionable  combined 
measure. 


!Cuinerosii}\  In  Experiment  I,  in  which  the  number  of 
dots  ranged  from  very  small  to  moderate  in  number,  all  three 
ratings  for  3  of  the  4  subjects  were  increased  by  increasing  the 
number  of  dots.  The  4th  subject  showed  a  very  different 
behavior  (e  g.,  sec  Rgurc4).The  fourKlot  stimuli  yielded  very 
high  depth  ratings  for  this  subject  The  subject  mentioned 
afterward  that  the  stimuli  reminded  him  of  organic  chemistry* 
drawings  and  yielded  a  vivid  percept. 

In  Experiment  2,  in  w  hich  the  number  of  dots  was  moderate 
to  large,  the  effect  of  the  number  of  dots  was  less  dramatic. 
Depth  rating  increased  slightly  and  saturated  at  these  high 
numerosittes.  At  these  larger  levels  of  numerosity,  the  effects 
of  numerosity  on  coherence  and  nudity  were  small*  There 
were  no  significant  effects  on  coherenire  ratings  and  numeros¬ 
ity  was  related  to  nudity  ratings  for  only  2  of  4  subjects.  In 
summary,  by.  increasing  dot  numerosity,  all  three  ratings 
increased,  up  to  a  point,  and  then  saturated.  E>epth  ratings 
appeal^  to  increase  and  saturate  in  a  continuous  manner, 
whereas  oiherence  and  nudity  ratings  were  high  for  all  but 
the  sparse  displays  (eight  or  fewer  elements).  These  findings 
are  in  general  agreement  with  those  of  Creen  (1961)  over 
similar  ranges  of  numerosity.  However,  Green’s  judgment 
was  one  of  overall  goodness  and  more  nearly  agrees  with  the 
depth  judgments  reported  here. 

ImensUy:  In  Experiment  3,  in  which  the  intensity  of  dis¬ 
plays  was  increased  from  0.86  to  42.7  jicd/dot,  there  was  no 
significant  effect  on  ratings  (with  the  exception  of  a  single 
subject  on  a  sin^e  rating).  We  ruled  out  an  effect  of  varying 
display  types  in  which  overall  stimulus  intensity  (contrast) 
vanesm  the  visible  range.  (But  sec  Doshcr,  Landy,  &  Sperling, 
m  press,  for  manipulations  of  intensity  very  near  to  threshold, 
which  do  affect  kinetic  depth  performance.) 

Form.  For  conditions  in  which  a  spherical  shape  was 
directly  contrasted  with  a  cylinder  (Experiment  I),  the  sphere 
was  rated  more  rigid  than  the  cylinder  by  all  subjects  and 
more  coherent  by  3  of  the  4  subjects.  The  higher  nudity 
ratings  for  spheres  overall  was  actually  due  to  a  strong  inter¬ 
action  between  form  and  perspective.  Rigidity  ratings  were 
differentially  lower  for  cylinders  under  perspective  The 
sphere  gives  less  representation  to  dots  that  are  substantially 
affected  by  the  projection  factor  (far  from  the  axis  of  rotation), 
and  the  increase  in  perceived  nonrigidity  may  have  resulted 


Table  1 

Experimental  Factors  and  Conditions 


Experiment 

Numerosity 

Form 

(cylinder 

or 

sphere) 

Distribution 
(surface  or 
volume) 

Pctspcclive 
(parcel  or 
polar) 

Luminance 

1 

4. 8. 16, 36 

both 

yes 

yes 

I.4S  MCd/doi* 

2 

36,48, 64, 80 

cylinders 

y« 

yes 

3.02pcd/dol‘ 

3 

48 

cylinders 

ye* 

yes 

086,302,  11.52, 
42.72  iicd/dot' 

•  Point  plot  display,  resolution  1024  x  1024  tnxels;  36  new  frames  per  360*  rotation  (or  10*  per  frame), 
60  ms  per  new  frame,  or  2.16  s  per  full  rotation;  dark  room;  binocular  free  viewing;  viewing  distance 
1.1  m;  object  diameters  2*  visual  angle  (parallel  perspective). 

*  Raster  display,  resolution  512  x  512  pixels;  36  new  frames  per  360*  rotation  (or  10*  per  frame);  66  67 
ms  per  new  frame,  or  2.4  s  per  full  rotation;  dim  room  (8  cd/m^)  with  light-tight  viewing  hood; 
monocular  viewing  through  a  reduction  aperture;  viewing  distance  1.6  m;  object  diameters  2*  visual 
angle  (parallel  perspective). 
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NumbtrefPefnU  Oitfribution  Ptr»p»etfvt^ 


Numbar  of  Point*  Oiatribub’on  P*r»poctiv« 

Figure  3  Rejp&r.M  dau  for  all  ratings  and  stimulus  manipulations 
in  Expenments  I  (upper  panel)  and  2  (lowtr  panel),  pooled  across 
subjects.  (1  tie  parameter  is  the  particular  rating  made:  perceis-ed 
depth  (circles),  ngidity  (squares),  and  coherence  (triangles).  In  each 
panel,  the  first  set  of  curves  is  for  the  number  of  dots  In  the  stimulus, 
the  second  for  the  effect  of  distnbuling  the  dots  across  the  surface  or 
throughout  the  volume  of  the  object,  and  the  thud  for  the  cfTect  of 
pcrspectisc  transformation  (parallel  or  polar),) 


in  object  breakdown  (segmentation,  incoherence)  in  some 
cases.  Braunstcin  and  Andersen  (1984)  also  compared  spheres 
to  cy  linders.  They  used  each  form  as  the  base  for  elliptical 
distortions  and  found  that  sensitivity  to  minor  axis  variation 
(flatness  of  the  elliptical  orbits)  was  slightly  greater  when  the 
base  form  was  a  cylinder  than  when  the  base  form  was  a 
sphere.  However,  this  was  a  cfOss*cxpcrimcnl  comparison 
with  different  groups  of  subjects  in  the  difTerent  conditions. 

Distribution,  The  effect  of  dot  distnbulion  (in  the  volume 
or  on  the  surface)  was  generally  small  with  significant  indiWd* 
ua)  variation  (see  figures  3  and  4  and  Table  2).  Close  exam* 
ination  of  TpWc  2  suggests  that  distribution  was  more  impor* 
tani  when  numeroshy  was  large  and  when  (here  was  reduced 
«nglc«vicw  shape  information  (i.e.,  for  the  cylinder,  Experi* 
ments  2  and  3).  In  these  conditions,  distnbulion  in  the  volume 
increased  depth  ratings  for  3  of  the  4  subjects  and  might  have 
improved  whcrence  for  some  subjects  as  well. 


Green's  (1961)  overall  goodness  measure  showed  slightly 
hi^er.scores  for  surface  representations  than  for  completely 
random  placements  in  the  volume  of  a  cube,  but  an  enormous 
benefit  for  point  representations'  wih  r^ular  placements  in 
the  volume  Our  random  sampling  incorporated  j^rtiticn 
equality  and  hence  represented  a  compromise  betwwn 
Green's  random  and  regular  conditions  Dots  in  the  volume 
might  have  increased  depth  ratings  because  a  range  of  inter¬ 
mediate  velocities  was  represented  in  the  cylinders,  whereas 
in  surface  representations  of  cylinders,  all  dots  were  traveling 
at  more  nearly  the, same  velocity,  except  at  the  edges  of  the 
object.  To  the  extent  that  differential  velocity  supj^rted  depth 
segregation  (Braunsiein  &  Andersen,  1981),  repre^ntation  of 
interm»)iate  velocities  may  have  been  useful.  The  ability  of 
distribution  to  strongly  affect  the  kinetic  depth  percept  may 
have  alM  dtitndtd  on  the  unavailability  of  other  strong  cues 
to  shape  such  as  perspective,  texture  density,  or  contour  (see 
the  discussion  of  Figure  5  below). 

Perspective,  For  all  subjects,  the  rigidity  ratings  were  de¬ 
creased  by  adding  polar  perspective.  The  cflcct  of  perspective 
on  coherence  was  small  and  depended  on  the  subject.  A 
collateral  analysis  of  the  polar  perspective  trials  that  sorted 
those  occasions  on  wliich  the  perceived  rotation  direction 
disagreed  with  the  intended  rotation  direction  found  that 
most,  but  not  all,  of  the  decrease  in  rated  rigidity  with  polar 
projection  occurred  when  the  observer  perceived  the  stimulus 
in  the  reversed  mode  (sec  also  Gregory,  1970;  Schwartz  & 
Sperling,  1983).  Thus,  vv-hen  polar  displays  were  perceived  in 
their  reversed  mode,  they  appeared  grossly  nonrigid;  when 
polar  displays  were  perceived  ycridically,  they  appeared 
slightly  less  rigid  than  the  corresponding  parallel  displays. 

Neither. our  polar  stimuli  nor  our  parallel  stimuli  wtre 
viewed  at  the  appropriate  viewing  distance.  The  parallel  stim¬ 
uli  would  have  to  1^  viewed  from  inOnity;  the  polar  stimuli 
from  6  cm;  the  actual  viewing  distances  were  in  the  range  1 
m  in  the  vanous  expenments.  (Had  we  produced  appropri¬ 
ately  projected  objects  for  the  1  m  viewing  distance,  they 
would  have  been  negligibly  different  from  the  actual  parallel 
stimuli.  When  viewed  at  the  appropriate  viewing  distance  of 
6  cm,  our  polar  displays  possess^  little  dcpih—largcly  a 
consequence  of  the  large  scale.)  The  greater  mismatch  between 
appropriate  viewing  distance  and  the  actual  viewing  distance 
for  polar  stimuli  conceivably  might  have  account^  for  the 
fact  that  vcridically  perceived  polar  stimuli  received  slightly 
lower  rigidity  ratings  than  parallel  stimuli.  But  this  distance 
mismatch  does  not  bear  on  the  overwhelming  cause  of  non- 
rigidity  in  polar  displays— that  stimuli  are  perceived  in  re¬ 
versed  mode.  Even  the  secondary  effect  of  polar  projection 
on  rated  rigidity  may  have  depended  only  weakly  on  projec- 
tion/vicwing  distance  mismatch.  As  noted  previously,  a  pilot 
study  in  which  object  size  was  varied  by  a  factor  of  2:1 
(producing  a  change  between  projected  and  actual  viewing 
distance  of  2:1 )  had  no  significant  effect  on  any  rating.  Finally, 
Cutting  (198?)  found  little  impact  of  mismatch  between  sim¬ 
ulated  and  actual  viewing  distances. 

The  rigidity  and  coherence  results  reported  here  agree  with 
the  reported  relationship  between  the  amount  of  perspective 
and  the  ability  to  infer  the  intended  rotation  direction  (Braun- 
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.Table  2 


Signijtcant  Factors  in  Experiments^ 


Experiment/ 

Subjects 

Numerosity- 

Form 

.  Distribution 

Perspective , 

DxP 

1 

MSL 

(small  numbers) 

Depth  judgment 

ns  +.*•• 

ns 

BAR 

ns 

+,••• 

+.•• 

• 

CFS 

+,••• 

ns 

+,ns 

ns 

ns 

RHS 

u... 

ns 

ns 

2 

MSL 

(large  numbers) 

_ 

+»••• 

+.••• 

BAR 

+,••• 

+.*• 

• 

CFS 

ns 

t 

RHS 

— 

ns 

ns 

3 

MSL 

t 

BAR 

+,• 

+»• 

CFS 

_ 

+,ns 

t 

RHS 

— 

— 

ns 

ns 

1 

MSL 

(small  numbers) 

+.••• 

Coherence  judgment 

+,♦  ns 

BAR 

ns 

• 

CFS 

ns 

ns 

RHS 

+,••• 

ns 

2 

MSL 

(large  numbers) 
ns 

ns 

ns 

ns 

BAR 

ns 

... 

+,••• 

••• 

CFS 

ns 

*»••• 

ns 

RHS 

ns 

... 

ns 

3 

MSL 

... 

+.♦ 

ns 

t 

BAR 

... 

... 

+.• 

•• 

CFS 

+.• 

ns 

RHS 

— 

— 

• 

1 

MSL 

(small  numbers) 
+.• 

Rigidity  judgment 

+.••  +*t 

BAR 

ns 

CFS 

+.t 

ns 

RHS 

X.*»* 

+»• 

+.• 

">*** 

• 

2 

MSL 

(large  numbers) 
ns 

ns 

ns 

BAR 

ns 

CFS 

ns 

RHS 

ns 

ns 

3 

MSL 

... 

ns 

^•*** 

ns 

BAR 

+»• 

ns 

CFS 

... 

ns 

ns 

RHS 

— 

— 

ns 

ns 

Note,  The  p  values  (see  below)  are  the  ^gnlftcance  of  corresponding  F  values  from  an  analysis  of 
varunce  for  each  subject  treating  rotation  direction  and  toLens  of  Mimuli  as  the  random  factor.  The 
symbols  <*',  -f ,  and  -  indicate  the  pattern  of  the  effect  and  can  be  referenced  to  the  legend  list  below  for 
each  factor.  Sm  the  text  for  a  discussion  of  the  interaction  of  distribution  and  perspective.  Numerosity: 

increasing  with  number  of  points;  saturates  with  large  number  of  points;  U>  U-shaped  function  of 
number  of  points.  X.  t^est  for  smallest  and  largest  number  of  points.  Form:  -f.  sphere  >  cybnder. 
Dismbutloo:  +,  volume  >  surface.  Perspective:  +,  polar  >  parallel  D  X  P  -  interaction  of  disinbution 
and  perspective,  ns  not  agnificant.  [fishes  «  not  applicable. 
tp<.l()0.  •p<.05.  •*p<.OI.  •••p<.00l. 
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Table  3 

Summary  of  Related  Results 


Judgment 

Numerosily 

Distnbuiion 
Form  (surface  or 

(Olindef  or  sphere)  volume) 

Perspective 
(parallel  or  polar) 

Depth 

Coherence 

-PctersiJc,  1980 

-Braunstein,  1962 

4Bfaun$tein.  1962 
•Petersik,  1980 

Rjpdily 

-Petersik,  1980 

-Petersik,  1979 
-Braunstein,  1977 

Combined 

+-Grcen,  1961 

-Braunstein  &  Andersen,  -Green,  1961 
1984 

-Braunstein.  1962 
•Braunstein,  1977 

Rote.  The  symbols  4-,  and  -  indicate  the  pattern  of  the  cfTect  and  can  be  referenced  to  the  note  for 
Table  2.  The  symbol  ••  indicates  no  effect.  A  summary  of  some  relevant  factors  in  prior  experiments 
follous:  Braunstein  (1962).  dots  in  volume  of  cube,  N  «  2-6,  depth  and  coherence/ri^idity  judgments; 
Braunstein  (1977),  dots  in  volume  of  sphere,  1,000,  varied  perspective  in  hotiaonta]  and  vertical 
dimenuons,  direction  and  coherence/tigidity  judgments;  Braunstein  A  Andersen  (1984),  dots  on  surface 
of  sphere  or  ellipses,  if  ■  140-160,  shape  and  quality  judgment;  Green  (1961),  dots  or  Unc  elements  in 
volume  or  on  surface  of  ctile,  ^  4-64,  goodness  rating  (combined  segmentation  and  rigidity);  Peter^i: 
(1979),  dots  in  volume  of  sphere,  R  -  4-45,  depth  and  direction  rating;  Pctcrsil.  (1980).  dots  in  volume 
of  sphere.  R  "  5-60,  depth  and  direction  rating. 


Stein,  1977;  Pelersik,  1979).  The  difference  in  the  effect  of 
perspective  on  rated  rigidity  and  on  rated  coherence  may 
explain  the  inconsistent  results  of  Braunstein,  who  found  that 
perspective  decreased  a  combined  rating  of  coherence  and 
rigidity  in  one  study  (1962),  but  had  no  effect  in  another 
(1977). 

Perspective  generally  increased  the  rated  depth  (shape)  of 
the  pcrcepi  (although  not  all  contrasts  were  significant,  see 
Table  2).  A  pnor  study  by  Braunstein  (1962,  see  Table  3)also 
found  that  perspective  improved  a  “strength  of  depth"  rating. 
Pelcrsil.  (1980)  found  that  depth  judgments  were  not  affected 
by  perspective.  However,  this  same  study  found  no  effect  of 
numerosily  (A’  «  5-60)  on  depth,  which  suggests  that  the 
experiment  had  insuflicient  power. 

Interaction  of  Perspeetke  and  Distrihution.  By  adding  po¬ 
lar  perspective,  the  rated  depth  was  increased.  Distributing 
dots  throughout  the  volume  of  the  object  had  the  same  effect. 
However,  when  the  two  were  combined,  a  further  increase 
was  not  achieved.  This  interaction  between  perspective  and 


Numbur  of  Points  Distribution  Porspsetivs 


Figured  Coherence  ratings  in  Experiment  1  (The  parameter  is  the 
particular  subject.  Note  the  large  individual  differences.) 


dot  distribution  is  illustrated  in  Figure  5,  and  the  significance 
levels  for  each  subject  arc  listed  in  Table  2.  (Here  again,  the 
effect  of  distribution  was  greater  in  high-numcrosily  cylinders. 
Experiments  2  and  3.)  As  suggested  above,  some  factors  such 
as  distnbuiion  may  be  more  likely  to  control  the  percept  in 
the  absence  of  other  strong  cues  to  shape. 

A  Large  Individual  Difference.  Occasionally,  individual 
differences  were  very  striking,  An  example  of  this  is  shown  in 
Rgure  4,  Here  the  coherence  ratings  for  all  conditions  of 
Expenment  2  are  show  n  for  individual  subjects.  Subject  RHS 
was  the  only  subject  for  whom  the  increased  number  and 
distribution  of  dots  m  the  volume  of  the  object  decreased  the 
coherence  of  the  kinetic  depth  percept.  Individual  differences 
presumably  occurred  in  earlier  studies,  but  were  undetected 
because  prior  studies  colleetcd  few  observations  from  each 
subject  and  performed  cross-subject  analyses.  For  another 
example  of  large  individual  differences  in  KDE,  sec  Dosher, 
Sperling,  and  Wurst  (1986). 

Three  Ratings  Are  More  Irformative  Than  One.  So  far. 
we  have  described  the  empirical  results  with  respect  to  manip¬ 
ulations  of  dot  numerosity,  perspective,  and  so  forth.  What 
was  perhaps  most  important  was  the  added  information 
gained  by  having  multiple  ratings  of  the  stimuli.  These  ratings, 
each  of  which  can  be  (and  has  been  in  the  literature)  construed 
as  a  measure  of  the  “strength”  of  a  KDE  percept,  did  not 
necessarily  cov  ary.  In  many  cases,  as  we  have  seen,  an  exper¬ 
imental  manipulation  had  a  different  effect  on  different  rat¬ 
ings.  For  example,  shape  significantly  affected  mean  ratings 
of  ligdity,  but  had  little  or  no  effect  on  depth  ratings.  At  high 
numerosity,  further  increases  in  numerosity  continued  to 
increase  depth  ratings,  but  did  not  affect  coherence  ratings, 
and  so  forth. 

Correlations  Belneen  Ratings.  It  was  also  possible  to  do 
a  finer-grained  analysis  of  the  three  ratings  that  looked  be>  ond 
the  means  to  the  correlations  on  a  trial-by-triai  basis.  Table  4 
gves  the  trial-by-trial  cotrelaUons  pooled  over  conditions  and 
subjects.  Seven  of  the  nine  correlations  in  Table  4  ate  betw  een 
-0.12  and  +0.19;  and  the  two  highest  correlations  are  still 
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PmpMtIv* 

Figure  5.  Interaction  between  perspective  transformation  and  dot 
distribution  to  Experiments  I  and  2  (pooled). 


rather  low,  with  each  relationship  accounting  for  only  about 
12%  of  the  variance.  Although  the  subjects  were.afTected 
somewhat  dilTerently  by  some  of  the  experimental  factors,  on 
the  whole,  different  subjects  tended  to  use  the  ratings  quite 
similarly.  Such  low  correlations  between  ratings  clearly  rule 
out  a  simple,  single  factor  interpretation  that  would  require 
high  positive  or  high  negative  correlations  between  each  pair 
of  ratings.  For  every  subject,  at  least  two  of  the  three  interrat¬ 
ing  correlations  are  low.  Obviously,  the  rated  qualities  of 
kinetic  depth  pereept  reflect  at  least  two  underlying  dimen¬ 
sions.  Although  the  experiments  were  not  designed  in  a  way 
that  would  expose  the  KDE  depth  percept  to  muliidimen- 
jional  analysis,  they  were  suflicient  to  bring  this  inherent 
multidimensionality  to  the  fore.  The  fact  that  diflerent  ratings 
weigh  diflerently  on  these  dimensions  cannot  continue  to  be 
overlooked  in  KDE  research. 

Di.scussion 

Wide  Range  of  Percepts.  The  KDE  for  a  multidot  display 
is  quite  rich.  When  viewing  a  stimulus  with  a  small  number 
of  dots,  there  generally  are  many  possible  stable  percepts. 
Even  though  the  whole  was  geometncally  denved  from  a  rigid 
object,  perceptually,  subgroups  of  dots  form  clusters,  and  each 
subgroup  appears  to  move  independently  in  3-D,  acting  as  a 
separate  object.  Groups  of  two  orthreedotscanbe  perceived 
as  moving  independently  in  the  plane,  or  as  a  3-D  and  ngid 
configuration,  or  as  a  nonrigid  3-D  configuration  similar  to 
the  Ames  window  (as  in  GiUam,  1975,  1976,  in  which  line 
segments  were  used  rather  than  dots).  In  short,  groups  of  dots 


Tabled 

Correlallons  Between  Judgments;  All  Subjects 


Judgment  types 

I 

Experiment 

2 

3 

Deptb*rigidity 

.05 

-.12 

-.01 

Depth^coherence 

.08 

.07 

.16 

Ripdity-cohercnce 

.36 

.35 
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do  not  necessarily  cohere  as  sin^e  objects,  even  when  they 
are  being  perceived  in  depth'  and  when  a  unitary  rigid  3-D 
interpretation  is  available.  Even  when  dots  are  coirectly  per¬ 
ceived  as  parts  of  .one  coherent  object,  there  is  a  range  of 
po^ble  percepts  that  differ  in  shape,  in  depth,  and  in  per¬ 
ceived,  rigidity.  The  perceived  coherence,  sliape,  and  rigidity 
have  a  complex  and  partially  decoupled  relationship. 

Decoupled  Aspects  of Percepts.  Some  degree  of  decoupling 
between  aspects. of  a  KDE!  percept  has  been  known  since 
Braunstein  (1962)  varied  pers^tive  in  KDE  displays  and 
observed  an  inverse  effect  oh.  mean  judgments  of  depth  and 
of  combined  rigidity/cohercnce.  Hpmver,  .it  has  implicitly 
been  assumed  that  the  manipulation  of  rigidity  by  perspective 
was  a  special  case.  .'Die  cuirent  experiments  demonstrate  that 
this  decouph'ng  betvvcen  the  various  aspects  of  the  percept  is 
not  restricted  to  the  independent  variation  of  perceived  ri^d- 
ity  but  is  quite  general  because  different  factors  affect  different 
judgments.  In  terms  of  mean  ratings  in  our  experiments,  rated 
depth  was  significantly  affected  by  numerosity,  distribution, 
and  perspective.  Rated  segmentation  was  affected  primarily 
by  numerosity)  this  effect  reflected  a  division  between  sparse 
and  dense  levels  of  numerosity  (above  or  below  16  elements). 
Secondarily  it  was  affected  by  form  and  perspective.  Rated 
rigidity  was  pnmarily  affected  by  perspective  and  numerosity. 
Additionally,  correlations  among  the  three  ratings  were  low 
and  sometimes  negative  when  measured  on  a  trial-by-tnal 
basis. 

Experiments  in  the  literature  on  multidot  KDE  have  used 
as  the  dependent  measure  either  ratings  or  paired  compansons 
on  some  judgment  dimension  (see  Table  3).  The  judgment 
dimensions  either  selected  from  a  vanant  of  the  three  ratings 
used  here  or  combined  two  or  more  in  one  rating.  Conflation 
of  the  dependent  measure  may,  in  part,  explain  some  of  the 
inconsistencies  in  the  literature  noted  above.  In  particular, 
the  combined  coherence  and  rigidity  ratings  of  Braunstein 
(1962,  1977)  may  account  for  the  inconsistent  effect  of  per¬ 
spective  in  those  studies. 

Importance  of  Independent  Factors.  Our  experiment  ma¬ 
nipulated  a  number  of  factors  within  a  subject,  factors  that 
previously  had  been  examined  in  separate  expenments  or  had 
been  chosen  arbitrarily  as  fixed  factors  that  happened  to  differ, 
along  with  the  dependent  measure,  between  studies  in  the 
literature.  Shape,  distnbution,  and  numerosity  have  usually 
varied  haphazardly  between  experiments.  For  example, 
Braunstein  (1962,  1977)  found  inconsistent  patterns  of  per¬ 
spective  on  a  ngidity-cohcrence  judgment  using  2-6  dots  in 
the  volume  of  a  cube  and  1,000  dots  in  the  volume  of  a 
sphere,  respectively.  Braunstein  (1962)  and  Petersik  (1980) 
found  inconsistent  patterns  of  perspective  on  depth  judgments 
using  2-6  dots  in  the  volume  of  a  cube  and  4-43  dots  in  the 
volume  of  a  sphere,  respectively.  It  has  been  difficult  to  know 
whether  the  structural  and  numerosity  factors  explained  the 
inconsistency  in  patterns. 

The  results  of  our  expenments  can  be  viewed  as  filling  in 
Table  3  with  a  self-consistent  set  of  data  and  providing  pre¬ 
viously  unavailable  data  in  the  empty  ceils.  The  results  clearly 
separate  three  important  aspects  of  a  kinetic  depth  percept; 
depth  (shape),  coherence,  and  rigidity.  Because  our  stimulus 
parameters  are  manipulated  within  subject,  cells  are  directly 
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comparable.  A  number  of  our  results  are  similar  to  Green 
(1961),'  Braunstein  (1%2,  1977),  and  others.  In  othCT  cases, 
in  which  inconsistent  findings  were  reined,  we  suggested 
explanations  liased  on  confounded  dependent  measures..TjT>- 
iraliy,  inconsistent  results  between  KDE  experiments  result* 
from  judgments  that  combine  rompon'ent  aspecu  (e  g.,  depth, 
coherence,  rigidity)  in  unspecified,  but  probably  different 
weightings, 

Nonmqlion  Cues  to  Depth  in  KDE  Displays.  There  are 
two  cla^  of  cues  to  objM  structure  in  our  displays:  static 
cu«  such  as  density  and  2-D  object  contour  and  dynamic  or 
motion  cuu  that  depend  either  on  optic  flow  or  oh  chan^ng 
interpoint  distances.  Based  on  our  data,  the  greatest  likelihood : 
of  perceiving  the  veridical  shape  occurs  with  perspective  im¬ 
ages  of  spheres  (rather  than  cylinders),  with  high  dot  nume- 
rosity,  and  with  dots  in  the  volume  (rather  than  on  the 
surface). 

High  dot  numeroniy  guarantees  a  good  representation  of 
the  2-D  contour  of  the  sphere  and  of  2-D  density  cues.  Even 
when  the  2-D’contour  is  not  as  suggestive,  as  in  the  case  of 
the  cylinder,  the  density  cues  that  become  visible  with  high 
numerosity  may  be  important  in  providing  static  cues  to 
shape.  (In  the  case  of  surface  distnbution  'of  elements,  the  2- 
D  density  wall  increase  toward  the  edges,  whereas  in  volume 
distribution  of  elements,  the  density  cue  is  reversed,  as  in 
Figure  1.)  The  presence  of  2-D  cues  to  shape,  whether  from 
contour  or  density,  may  constrain  the  perception.  High  ele¬ 
ment  numerosity  also  minimiies  the  likelihood  of  atypical 
clumping  or  grouping  characteristics  likely  in  low  numerosity 
figures,  which  then  are  likely  to  cause  grouped  or  segmented 
(t.e.,  incoherent)  percepts. 

Peispective  may  simply  serve  as  an  additional  cue  to  depth 
oiganiratlon.  Alternatively,  the  exaggerated  perspective  used 
here  may  be  especially  effective  because  it  slightly  increases 
the  proportion  of  elements  moving  in  the  same  direction, 
yielding  a  display  similar  to  that  arising  from  an  image  with 
occlusion,  and  possibly  allowing  stronger  input  to  an  optic 
flow  anal)'sis  at  high  numerosity  (J.  Todd,  personal  commu¬ 
nication,  March  1987), 

Distribution  in  the  volume  provides  a  range  of  velocities  in 
any  local  area  and  may  support  a  full  depth  percept  by  relating 
distance  from  the  axis  of  rotation  to  dot  velocity.  Dot  fields 
of  different  velocity,  whether  adjacent  or  superimposed,  tend 
to  segregate  in  depth  (Braunstein  &  Andersen,  1981). 

Percept  Description  Versus  Objective  Task  Measures.  An 
alternative  to  the  measurement  of  one  or  another  aspect  of 
the  kinetic  depth  percept  by  rating  is  to  conceptualize  a 
different  sort  of  question.  In  rating,  we  ask  about  various 
aspects  of  the  percept  itself.  An  alternative  is  to  ask  whether 
a  percept,  whatever  its  subjective  appearance,  is  adequate  to 
support  objective  performance  on  a  particular  kind  of  judg¬ 
ment,  such  as  a  judgment  of  shape.  Some  attempts  have  been 
made  in  this  regard.  For  example,  Todd  (1984)  required 
subjects  to  make  objective  curvature  judgments  under  various 
levels  of  nonripdity  in  the  kinetic  depth  image.  Lappin  ct  al. 
(1980)  required  subjects  to  make  objective,  paired-compari¬ 
son  judgments  of  the  degree  of  correspondence  in  two-frame 
displays.  We  investigated  one  possible  objective  measure  of 
having  perceived  shape  from  a  kinetic  depth  display  (Dosher, 


Landy,  &  Sperling,  in  press;  Landy,  Sperling.  Dosher,  & 
Perkins,  1987;  Sperling  Landy;  Dosher,  &  Perkins,  1989) 
This  objective  measure  require  subjects  to  identify  the  objea 
perceived  from  among  a  large  lexicon  of  posrible  objects  and 
offers  an  attractive  alternative  to  the  elaboration  of  subjective 
methods  under  study  here. 

Relation  to  Models.  Three  classes  of  computational 
models  have  been  proposed  to  account  for  the  kinetic  depth 
effea  based  on  motion  cues:  those  deriving  shape  from  optic 
flow  fields  (Clocksin,  1980;  Koenderink  &  van  Doom,  1986), 
those  deriving  analytic  solutions  by  assuming  ripdity  from  nt 
views  ofn  points  (Hoffman  &  Bennett,  1985;Ullman,  1979), 
and  those  based  on  maximizing  rigidity  in  interpoint  dtstances 
(Hildreth  &  Grzywacz,  1986;  Landy,  1987;  Ullman,  1979, 
1984).  Usually,  flow-field  models  are  applied  to  objects  com¬ 
posed  of  densely  packed  points,  and  interpoint-distance 
models  are  applied  to  images  composed  of  less  than  a  few 
dozen  points.  Inteipoint-distarice  models  apply  geometric 
computations  to  the  2-D  image-plane  positions  of  the  given 
points  to  compute  a  3-D  object  that  is  either  totally  rigid 
(Ullman;  1979)  or  a  3-D  object  that  deforms  minimally 
between  adjacent  frames  (Landy,  1987;  Ullman,  1984). 

We  will  illustrate  the  problems  that  rigidity  models  have 
with  data  such  as  outs  by  considering,  as  an  example,  the 
incremental  rigidity  algorithm  of  Ullman  ( 1 984).  When  an  n- 
point  3-D  object  undergoes  rotation,  the  algorithm  takes  as 
its  input  a  sequence  of  frames  that  represent  the  2-D  image- 
plane  X,  y  projections  of  the  n  points.  For  each  frame,  the 
algorithm  outpuu  an  estimated  depth  value  :  for  each  point 
plus  one  overall  fidelity  score.  The  computation  consists  of  a 
gradient  descent  in  the  space  of  depth  values  z  to  maximize 
the  fidelity  criterion.  This  criterion  measures  the  deformation 
(nonri^dity)  in  the  recovered  3-D  object  between  the  current 
frame  and  the  prior  frame. 

To  evaluate  such  an  algonthm  as  a  psychological  model, 
one  must  associate  quantities  produced  by  the  algorithm  with 
aspects  of  human  perception.  We  have  shown  here  three 
aspects  of  performance  that  are  partially  separable  in  perform¬ 
ance  measures:  segmentation,  depth,  and  rigidity.  Consider 
what  happens  in  the  case  of  four-point  displays.  Ullman's 
(1984)  algonthm,  like  most  others,  simply  assumes  element 
correspondence  and  figural  segmentation  as  prior  processes 
For  four-point  objects,  the  incremenul  rigidity  algonthm 
would  have  recovered  the  veridical  single  object  with  ngid 
depth  assignments  for  all  nonperspective  images  in  the  exper¬ 
iment.  On  the  other  band,  only  1  of  our  4  subjects  regularly 
perceived  four-point  displays  as  unitary:  for  the  other  subjects, 
these  were  usually  perceived  as  two  or  more  objects  moving 
independently-  This  grossly  violates  the  segmentation  as¬ 
sumed  by  the  Ullman  model. 

The  algorithm’s  estimated  depth  values  seem  a  plausible 
basis  for  predicting  human  depth  judgments,  and  the  algo¬ 
rithm’s  fidelity  score  seems  a  plausible  basis  of  rigidity  judg¬ 
ments.  An  immediate  problem  is  that  perspective^ependent 
modulations  in  position  of  elements  on  the  image  plane  are 
treated  as  noise  by  this  (and  most  other)  algonthms  (Sperling 
&  Dosher,  1987),  although  out  subjects’  depth  percepts  are 
improved  by  moderate  amounts  of  petspective.  The  problems 
surrounding  predictions  with  parallel  and  perspective  projec- 
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tiohs  are  particularly  enlightening.  Detailed  consideration 
(Doshcr  &  Sperling,  1988;  Sperling  &  Doshcr,  1987 )  showed 
that  a  class  of  models  including  Ullman’s  (!984)'cxhibit 
unrealistic  properties  because  they  do  not  incor^rate  any 
perspective  transfonnation,  wher^  the  models  would  r^uirc 
a  flexible  pers^tive  transformation  to  deal  with  perceptual 
fa^ts:  Parallel  perspective  algorithms,  such  as  Ullman's,  when 
applied  to  perspective  images,  such  as  those  in'our  experi¬ 
ments,  yield  flattened  depth  estimates  in  relation  to  nonper- 
speciive  images.  On  the  contrary,  for  our  human  observers, 
perspective  increased  depth  ratings  slightly. 

Predictions  of  perceived  nudity  ba^  on  the  fidelity  crite¬ 
rion  are  the,  most  problematic  asp^  of  Ullman’s  (1984) 
algorithm.  When  the  image  is  produced  by  perspective  trans¬ 
formation,  parallel-perspective  rigidity  algorithms  (e  i,  Ull- 
man)  cannot  distinguish  between  veridical  and  reversed  depth 
3-D  recovered  objects— they  yield  precisely  cqu^  rigid  and 
nonrigid  solutions.  For  our  subjects,  reverb  depth  percep¬ 
tions  arc  ^ossly  more  nonri^d  than  the  veridical  ones,  a 
powerful  perceptual  fact  that  is  beyond  the  scope  of  this  class 
of  models. 

Purely  geometric  algorithms  that  yield  explicit  solutions  to 
3-D  objects  given  m  views  of  n  points  (Bennett  &  Hoffman, 
1985;  Hoffman  &  Bennett,  1985;  Ullman,  1979;  Webb  & 
Aggarwal,  1981)  fare  much  worse  than  the  incrernental  rigid¬ 
ity  algonthm.  Again,  segmentation  is  simply  assumed.  The 
algorithms  yield  exact  solutions  under  certain  conditions  in 
which  the  stimuli  represent  rigid  objects.  The  outputs  here 
are  exact  solutions  or  the  fact  that  a  solution  failed,  An  exact 
solution  must  be  rigid,  so  the  model  cannot  predict  any 
particular  nonngid  percept,  nor  docs  it  have  computational 
by-products  that  can  support  rigidity-nonrigidity  judgments, 
partial  depth,  or  incomplete  segmentation. 

We  conclude  that,  although  many  existing  algorithms  arc 
of  great  interest  as  a  possible  basis  for  robotics  solutions  to 
the  structurc-from-motion  problem,  they  are  inadequate  as 
ps>chological  models.  Our  experiments  suggest  that  a  suc¬ 
cessful  psychological  model  must  identify  at  least  three  spa¬ 
rable  aspects  of  recovered  objects  that  can  serve  as  a  basis  for 
the  three  separable,  measurable  aspects  of  kinetic  depth  per¬ 
ception. 
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ABSTKACT  We  dcsMastnle.ta*  kM$  cf  final  stimfi 
Ifac  edriM  MliM  M  cae  ^vtctiM  «fca  fit»<4  frw  Bear  aad 
ii  the  epfosiie  dkecdoB  freai  aEiK ‘nese  strikiis  retenals 
•cev  bco^  cBci  Had  flf  stilMiis  is  CMStracfed  to  flMfta* 
•mefr  actnale  too  dMcreBt  ■refcaiju’  a  slwr1-nM^ 
BMduDKSB  that  ceap«tc5  BaiiM  Am  SfdKC^^aK  com^pOB* 
dcBces  hi jtliwiliT  laaaaaBce  aad  a  Inag  rtf  tehaafeM  ia 
Bhkii  Modes  cosipstalieas  art  perlorBttd,  jastead,  os  sd*- 
■Ibs  coatrast  that  has  beta  fidl>aaft  ncdHed  (e.g.,  ca  the 
idisoiafe  fdae  of  coidr^). 


We  demonstrate  tut)  dynairic  sisua3  stinutli  that  appear  to 
loose  in  one  direction  uben  sieued  from  near  and  in  the 
oppoMie  direction  from  afar.  This  remariairfe  rcs*efsa]  cS 
apparent  {notion  occurs  because  the  stisnifi  are  constructed 
to  dmuhaseously  actis*ate  tuo  di0ercol  mechanisms:  a 
nrst-order  mechanism  that  computes  {notion  from  space¬ 
time  correspondences  in  raw  stimulus  luminance  and  a 
secorMi-oroer  mechanism  that -uses,  instead,  a  fuU-^avt 
rectified  transformation  (e.g..  the  absolute  \*3lue)  of  stimulus 
contrast  to  compute  {notion. 

The  first  stimulus.  6.  a  rightward  $(ep(»ng,  contrast- 
reversing  bar,  is  a  sariant  of  Anstis's  (1)  re\erscd-phi 
stimulus.  Wlut  we  add  are  quite  difTerent  explanations  of  the 
ordinary  and  (be  resersed  motions  in  this  stimulus  and  the 
conditions  under  which  each  is  perceived. 

The. second  stimulus.  P.  a  stepping,  contrast-reversing 
grating,  is  an  clabomtion  of  (he  first  with  (wo  useful  proper* 
ties:  (0  It  provides  the  first-  and  second-order  systems  with 
motion  signals  of  identical  spatial  frequency,  moving  at  (he 
same  rale,  but  in  opposite  directions;  and  (//)  its  motion 
direction  is  totally  amtnguous  to  any  half-wave  rcctifjing 
s>'$lem.  The  dominance  of  the  first-order  mechanism  when 
the  retinal  image  is  small  (far-viewing)  suggests  that  it  is  the 
mechanism  of  Braddick's  (2)  short-range  system;  (he  domi¬ 
nance  of  (he  second-order  mechanism  with  Iai;ge  retinal 
images  suggests  that  itjs  the  mechanism  of  (he  long-range 
sysic-m. 

Since  Braddick  (2)  proposed  (hat  there  are  two  motion 
perception  mechanisms  with  difTerent  properties— a  short- 
range  and  long-range  motion-perception  system,  the  issue 
has  been  intensely  investigated  (3-16).  The  following  differ¬ 
ences  between  (he  short-range  and  long-range  systems  are 
proposed.  The  short-range  system  requires  successive  stim¬ 
uli  to  be  displaced  in  space  by  a  small  distance  within  a 
small  time  period  Ar  and  presented  to  the  same  eye.  The 
long-range  system  tolerates  large  Ax,  At,  and  interocular 
presentation  (2, 12). 

Anstis  and  Mather  (16)  noted  (hat  in  making  its  matches 
across  time  and  space,  the  long-range  system  is  indifferent  to 
sign  of  contrast.  Motion  is  generated  between  successively 
displayed,  spatiotemporally  displaced  points  on  a  grey  back¬ 
ground,  even  when  (hey  arc  of  opposite  contrast  polarity 


The  publfcatkm  costs  of  this  ankle  were  defra)ed  in  pan  by  page  charge 
paymcM.  This  anxle  muw  iherefore  be  hereby  mark^  ad\  tnurmf/n 
in  accordance  with  18  t.S  C.  {1734  solely  to  indicate  this  fact. 


dc.,  ooe  is  at^  aad  the  ocher  bieek).  Quite  the  reverse  >s 
tn>e  oS  the  shcrt-f»^  sysieoL.  The  seantiviiy  of  the  short' 
ru^  system  to  the  s%n  of  coosmt  is  ex2£i^  strilticgiy  ia 
thepbeowaccoocfrermerf-phiapparadiaociooU):  WTsea 
a  picsem  is  flashed  twice  ia  qc^  sacecsssoo.  aith  the  secood 
flash  subtly  cfispfaced  in  space  froca  tie  first,  fsotxn  (caSed 
^  taotioo)  is  perceived  ia  the  directioa  of  the  dispbeeasest. 
However,  if  the  coetiast  of  the  pacturc  is  re^ers^  between 
the  first  and  second  ftash.  taotion  may  be  peivched  in  the 
direction  opposite  to  the  t&pbcemenL  This  is  reverscd'Tptd 
motion. 

Wha:  has  been  lacking  is  a  dear  specinc.^tioD  of  the 
medsanisms  govern:;^  the  short-  and  long-range  systems. 
Here  we  introduce  two  stimuli,  the  contrast  reversing  bar  B 
(Hg,  l<f)  and  the  stepping,  contrast-reversing  grating  V  (sec 
I^2a)tlat  dtspb>  sh(M'range(rcveTscd-{^)in{Mk>ntothc 
left  when  viewed  from  far  away  and  long-range  motion  to  the 
right  when  virwed  from  a  short  distance.  F  is  constructed  so 
as  to  p!a»  important  constraints  on  the  undcrlyi:^  mecha- 
msmsthai  detect  the  motion  it  displays  from  both  far  and  near 
vicwfi^  distances.  Spcdfically,  F  rules  out  the  possilnlity 
that  cither  sort  of  motion  is  mediated  by  halfzwaxe  rectifi¬ 
cation.  Rather.- F  strongly  suggests  that  tl^  short-range 
system  applies  what  we  shall  call  standard  motion  analysis  to 
raw-  stimulus  luminance,  while  the  particular  long-range 
system  stimulated  F  from  short  viewing  distances  applies 
standard  mc^ion  analysis  to  a  full’H-ax  e  rectified  transforma- 
tkm  of  stimulus  contrast. 

A  monochromatic  visual  stimulus  is  a  function  that  assigns 
a  luminous  flux  to  each  point  in  space-time.  How  ever,  from 
a  perceptual  point  of  view,  a  stimulus  is  better  descril^  by 
Its  contrast  than  by  its  luminance !.  Thus,  a  stimulus  5  is  the 
normalized  deviation  of  /(x.  y.  /)  from  its  mean  luminance  /q* 
that  is,  for  any  point  x,  y,  /  in  space-time.  5tx,  y,  f)  =  [llx,  y, 
0  -  IcVlo-  Because  a  stimulus  is  defined  in  terms  of  the 
contrast-modulation  function  5  (rather  than  the  raw  lumi¬ 
nance  function  /).  stimulus  values  (unlike  luminance  values) 
may  be  positive  or  negative. 

To  simplify  the  discussion,  we  consider  only  stimuli  that  do 
not  vary  in  the  vertical  dimension,  i.e.,  stimuli  that  can  be 
described  as  horizontally  moving  patterns  of  vertically  on- 
ented  bars.  Any  such  verucally-constant  stimulus  ts  charac¬ 
terized  in  all  relevant  respects  by  itsx/  cross-section  5(x,  /), 
a  dice  made  perpendicular  to  the  vertical  axis  of  space  to 
reveal  stimulus  contrast  as  a  function  of  honzontal  space  (x) 
and  lime  (r). 

Fig.  2o  depicts  eight  frames  of  a  movie  ofa  dark  vertical  bar 
stepping  left-to-right  across  a  bright  field.  Fig.  lb  is  the  xi 
cross-section  of  the  rightward-stepping  dark  bar.  Fig.  Ic 
shows  an  xr  cross-section  of  a  nghlward-dnfting,  vertically 
oriented  sms-wave grating  A(x,  /) »  sin(x  -  /).Thissme-wave 
component  of  b  is  shown  superimposed  on  b.  Fig.  Ic 
illustrates  how  (he  detection  of  motion  m  a  complex  stimulus 
can  be  understood  in  terms  of  motion  of  the  sine-wave 
components. 

It  is  immediately  obvious  from  thexr  cioss-sections  of  (he 
rightward-stepping  bar  and  sine-wave  stimuli  (hat  the  prob- 
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Fig.  1.  Slant  in  x  and  y  corresponds  to  notion  in  xl.  (a)  Ei^t 
frames  in  a  display  of  a  ri£hluard>steppins.  vertical  bar:  x  and  y 
rcfvesent  the  spatial  dimensions  of  the  dist^ay.  and  /represents  lime. 
This  stimulus  does  not  vary  in  the  y  dimension^  Each  of  panels  h-y 
is  an  xt  cross-section  of  a  dynamic  stimulus  that  does  not  vary  m  y. 
(6)  An  xt  cross'seciion  of  the  righluard'Sleppins  bar  of  panel  a. 
Horizontal  luminances  are  indicated  alon^  x,  temporal  luminances 
are  indicated  sertically  uiih  time  /  running  dmsnuard.  (r)  Stimulus 
of  ^  shovso  together  uiih  one  of  its  largest  sinusmdal  components  (<0 
The  Gaussian  uindov^ed.  contrast-reversing  stepping  bar— stimulus 
6.  (e)  Stimulus  U  shou  n  together  uiih  its  brgest  sinusoidal  compo¬ 
nent.  indicating  why  its  far  view  (first-order,  Fourier)  motion  is  to  the 
left  (/)  A  row  of  vertical  bars,  randomly  cX  positive  or  negative 
contrast  the  am(4it  ude  of  which  is  modulated  by  a  ri^tward-dnfiing 
grating*  (g)  a  row  of  random,  black/white  venic^  bars  the  Dicker  rate 
of  which  is  modulated  by  a  ri^iward-dnfilng  grating.  The  motion  of 
all  four  stimuli  is  as  obvious  to  all  viewers  as  is  the  slant  of  the  x,  $ 
cross-section  The  rightward  motion  of  the  Mack  bar  (b  and  c).  and 
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lea  of  iSeteclia;  saoeaoo  ia  X/ is  CQsiv^eaa  so  the  pcobScao  of 
detcicti^  cincscasipo  o  xy.  Tbafl  it.  tbe  g^rcepCam  oTaa 
pattera  t3»l?ns  down  fo  the  it  analoencis  to  the 
percepCaoQ  ofaa  j[7  pasicni  inmop  to  the 

Id  shows  the  eoflCnst'fnrcrriaS  tv  B.  wfdcfa  it 
Gaetsa>w«DdowTd  ia  taoe.  U*bea  the  tv  Ckto  60  steps  per 
sec  (ooe  sttp  emy  17  ssec}  and  is  wxadowed  by  a  GaattiaQ 
faoetsoo  wto  SD  of  25  o»ec.  evtty  obsener  so  far  bat 
repocted  the  directioQ  of  oweioo  as  beii^  to  the  wixa 

viewed  is  ceatxaJ  ntaoo  froos  a  wade  rau^  of  near  distaaces. 
Oo  the  ocher  baod.  B  appears  to  be  oxmi^  leftward  when 
viewed  io  peripheral  visioa  froia  near  or  when  viewed  ro 
cestral  vnioo  alvover  a  setaner  ranse  of (£tl;^eesiie3r 

the  vanbluss  poiat.  Ir  sog^e^  the  Fborier  basis  of  the 

far-view*  axtcioa:  the  dominant  tiae-wave  coaspooeats  axe 
Icftwaxd  07).  'We  momcrtaxfly  defer  tbs  expfaaaiioo  of 
r^htward  oMneaicoL 

\%cal  slant  detection  (oftea  caged  ooectation  detection)  b 
geoera^y  ihoo^t  to  ias'olve  oriented  Hubel'^iesel  (18) 
receptive  fields  to  area  17  of  the  vtsual  cortex.  The  corre- 
^loodiap  computational  mechanisms  are  oriented  Imear  ni¬ 
ters  (19.  20).  The  detection  of  slant,  bow-ever.  involves  a 
funher  (inherently  nonlinear)  stage  of  processing:  A  decitson 
^xxit  the  dominant  slant  ofa  spatial  stimulus  S  most  be  made 
with  reference  to  the  relative  energy  in  the  rcsponses-lo-S  of 
various  linear  (titers  in  djfTcrent  phases  and  orienutions.  A 
wide  range  of  models  to  exg^n  sl^i  (and  motion)  perception 
apply  computations  of  this  sort  to  the  visual  stimulus  (17. 21* 
28).  and  sinular  computations  are  coming  to  have  wide 
deifications  in  robotic  virion  (29. 30). 

Although  exclusively  spatial  d^eefors  are  ph>‘sically  dif- 
ferem  from  spatiotempond  detectors,  the  computations  for 
orientation-detection  ^  moiion-detcction  are  quite  similar. 
For  bo).*!  slant  and  motion,  the  quantity  computed  by  any 
energ>'-an3J>lic  detector  can  be  cast  as  a  linear  combination 
of  ihc  paimise  products  of  stimulus  v*alues«  ${xi, 
for  I  and/  both  ranpng  over  all  pmnis  in  space-lime.  (For 
slant  detectors  the  time  variables  /#  and  tj  are  replaced  by 
vertical  space  variables yy  and y^.)  We  refer  to  computations 
of  this  sort  as  standard  motion  (or  slant)  analysis. 

Lei  )>j  be  a  standard  motion  analyzer  dcHned  for  any 
stimulus  5  by 

Di{S)  =  ZlWijSlx,.  i.)5ix;.  0).  m 

where  each  W/j  is  a  real-valued  weight.  The  standard  motion 
analyzer  tuned  to  the  same  sort  of  motion  as  Di.  but  in  the 
opposite  direction,  is 

WS)  -  llWuSix,.  lj)S(x,.  I,).  I2I 

Any  stimulus  5  is  called  microbalanced  if  and  only  if  for  any 
such  oppositely  tuned  standard  motion  analyzers,  Oj  and  Dj, 
the  expected  response  £|£>f(5))  is  equal  to  (he  expected 
response  £!0.»(.^j  (31). 

Although,  as  this  definition  indicates,  microbalanced  ran¬ 
dom  stimuli  yield  no  signs  of  systemanc  motion  to  standard 
motion  analysis,  it  is  nonetheless  possible  to  construct  a  wide 
variety  of  microbalanced  random  stimuli  that  display  con¬ 
sistent  motion  across  independent  realizations  (31, 32).  For 
example,  the  amplitude-modulated  noise  stimulus  /  and  the 
frequency-modulated  noise stimulus7in  Fig.  l/andg(31)arc 
microbalanced  Nonetheless,  observers  universally  perceive 

the  leftward  far-view  motion  of  the  contrast-reversing  bar  B  {d  and 
r)are  accessible  to  first-order  mechanisms,  ihe  nghtward  motion  of 
stimu!ii/and  g},  and  (he  nghlward  near-view  motion  of  B  (</)  are  not 
The  motion  of  stimulus /can  be  exposed  to  standard  analysis  by 
simple  half-  or  full-wave  rectification,  stimulus  g  requires  a  temporal 
linear  filter  (e  g ,  a  temporal  differentiator)  before  rectification 
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Ibe  d)iianw  vemoos  of  tbbc  sumoS  as  snomg  f^^itmard, 
and  the  fextare  vctsiods  as  sbnlcd  downed  to  tbe  rijgbi. 

FbDoirics  Csvaoagbt. ««  caB  any  niocioo  neebaoism  that 
appCes  standard  motion  (or  slaiii}  anab’sis  (firectly  to  tuou' 
(or  to  a  focar  trm^onnatioo  of  taounaoce)  a  fiM* 
oc^iiigchan>$m.Anyiix)<ionmcdiapismttoapi>to 
,  dard  motion  (or  slant)  ana!>‘sb  to  a  grossly  nonlmear  tms- 
fonna&m  of  hmmiaoce  ts  called  a  second-order  medsamssL 
There  ts  2  sunple  way  to  expose  tbe  mbereni  mocion  (or 
slam)  m  stimolf  sodi  as  /  and  7:  (i)  apply  a  temporal  (or 
vee^dt)  linear  filter,  (S)  rectify  tbe  resoh.  and  (ii)  apply 
standard  anal)*^*  Tbm  are  two  candidate  sdienes  of 
rectification:  fi^watv  rectification.  «bi^  consists  of  com- 
potmg  tbe.  absolute  value  (or  a  monotcnkally  tnereasmg 
functioo  c^tbe  absolute  \alue)  of  tbe  filtered  contrast,  and 
balf-wa%*e  rectification,  wtich  consists  of  making  tndepen- 
dem,  separate  computations  on  poshhe  and  on  n^^'ve 
values  of  fHtered  contiast.  FuB-waw  rectification  has  a  long 
btstoiy  oTutBhy  In  »gna)  processing.  Half-wa\*e  rectification 
zppenrs  to  be  a  vndespe^,  almost  universal,  f^y^cdopcal 
process:  Because  neurons  ha\‘e  only  a  positi\  e  output  (their 
firii^  frequency),  they  are  paired  in  onler  to  economically 
conv^  po$iti\’e  and  negative  signal  values.  In  tbe  visual 
system,  one  pair-member  (an  **on-center**  neuron)  carries 
values  of  positive  contrast,  whereas  its  pair-mate  (an  **ofr- 
center"  neuron)  carries  negative  contrast  values. 

The  phenomenon  of  revorsed-phi  motion  (1)  demonstrated 
in  the  far  viewing  of  Fig.  Id  (and  many  sicular  results  (17)] 
could  not  occur  if  the  short-range  system  applied  a  full-wav  e 
rectifier  befme  standard  motion  analysis.  Simple  full-wave 
rectification  of  contrast  obliterates  the  difference  between 
the  simple  moving  bar  (Fig.  lb)  and  corresponding  contrast- 
reverslng  bar  (B.  Fig.  Id).  Any  mechanism  that  full-wave 
rectified  contrast  before  motion  analysis  w  ould  issue  similar 
responses  for  the  stimuli  of  Fig.  1  6  and  d. 

These  considerations  do  not,  however,  rule  out  the  possi- 
tnlity  that  the  short-range  sy'stem  uses  a  half-wave  rectifier 
before  standard  motion  analysis  (33).  Perhaps  both  short- 
range  motion  and  (he  motion  of  various  microbalanced 
random  stimuli  such  as  /  (Fig.  If)  and  J  (Fig.  Ig)  can  be 
explained  with  reference  to  a  sin^e  kind  of  mech  n,  one 
that  applies  to  stimulus  contrast  a  linear  fiite*.  (hen  a 
half-wave  rectifier,  and  finally  some  form  of  standard  motion 
analysis.  Or  perhaps,  as  seems  more  likely,  short-range 
motion  results  from  applying  standard  motion  analysis  di¬ 
rectly  to  contrast.  In  this  case,  wc  are  left  with  the  question 
of  what  sorts  of  rectification  are  involved  in  perceiving  the 
motion  of  microbalanced  random  stimuli. 

These  issues  are  cleared  up  by  (he  leftward-stepping, 
contrast-reversing  grating  f  defined  in  Fg.  2.  An  xt  cross- 
section  off  is  shown  in  Fg.  2a.  The  temporal  scale  and  the 
(distance-dependent)  spatial  scale  of  the  display  are  de¬ 
scribed  in  the  legend  for  Hg.  2.  P  is  perceived  to  move 
leftward  from  near  viewing  distances  and  rightward  from  far 
distances.  P  has  been  viewed  by  dozens  of  subjects  m  our  lab, 
and  the  reversal  of  apparent  motion  with  viewing  distance  has 
been  observed  by  ail. 

The  far-view  motion  of  P  is  detected  by  (he  short-range 
system.  Note  that  in  each  successive  display,  P 1$  shifted  1/4 
spatial  cycle  leftward,  and  its  contrast  is  reversed.  Thus,  we 
should  expect  P  to  elicit  reversed-phi  motion  under  appro¬ 
priate  conditions.  And  indeed,  when  the  spatial  displacement 
between  successive  displays  is  made  sufficiently  small  by 
moving  the  viewer  back  from  the  screen,  P  exhibits  reversed- 

fCavanagh,  P ,  Conference  on  Visual  Form  and  Klotion  Perception- 
Psycho^ysics.  Computation,  and  Neural  Networks,  March  5, 
1^.  Boston  University,  Boston,  MA. 

^Rectification  alone  suffices  to  expose  the  motion  of  /  to  standard 
analyfis,  temporal  differentiation  and  rectification  are  required  for 7. 


Fjc.2.  Gn^lnc  aoaI>‘sis  of  (he  motioa  cooleot  of  siinmltts  P.  a 
borxzoouSy  windowed,  leftward-steppiog  gralii^  of  vertical  bars 
tbtf  revenes  coouasl  with  each  step,  (a)  Ao  XT  cross-section  ^  P.  P 
is  len^oraBy  pesiodie.  The  tenqsoral  slice  displayed  here  contains 
0ght  frames.,  each  of  which  lasts  1/15  sec;  thw.  the  total  duration 
shown  is 533  msec.  From  far  (8  m).  the  width  of  a  is  «*C.6  degrees 
visual  ai^  (dva).  and  each  verti^  bar  in  the  grath^  has  a  width  of 
OJOi  dva.  (b)  A  sinusoid  is  overlakl  on  P  to  iUustxate  the  percehed 
notion  of  P  when  vieu-ed  from  8  m.  Cdnfonmiy  to  stnosoidal  analysis 
suggests  that  tbe  far-view'  motion  of  P  is  first-order,  (c)  [r]<  the 
absolute  vabe  (foD-wav  e  rectified)  transformatioo  c^P.  From  near  (2 
o),  the  stuttulus  r  displays  motion  coefonpir^  to  the  »nus(^ 
overlaid  on  |rj,  suggesting  that  tbe  near-view  motion  of  P  is 
seeooi-order  and  possiUy  mediated  by  full-wave  rectification  of 
stiowlttt  contrast. 

motion  to  the  right,  implicating  the  short-range  system. 
The  velocity  of  this  far-view  motion  is  easily  distinguished  by 
all  subjem  and  is  equal  to  that  of  the  grating  overlaid  on  P  in 
2b-  As  this  ovc^y  makes  clear,  the  far-view  motion  of 
r  is  signaled  directly  by  the  distribution  of  energy  in  the 
Fouricf  transform  of  P.  Typically,  standard  motion-analylic 
computations  reflect  this  distnbution  of  Fourier  energy  in  the 
stimulus.  Thus,  the  far-view  motion  of  P  is  the  predicted 
response  of  a  first-order  mechanism. 

By  contrast,  the  near-view  motion  of  P  is  detected  by  a 
second-order  mechanism.  It  is  evident  to  all  viewers  that  the 
leftward  motion  displayed  oy  P  from  short  viewing  distances 
is  carried  directly  by  the  leftward-stepping,  contrast-revers¬ 
ing.  vertical  bars.  However,  P  has  no  energy  in  any  Fourier 
oimponent  (drifting  sinusoidal  grating)  whose  velocity 
matches  (hat  of  these  lefiward-stepping  bars.  This  indicates 
that  the  near-view  motion  of  P  is  not  obtained  directly  by 
standa^  analysis  Wc  can,  however,  expose  the  near-view 
motion  of  P  by  full-wave  rectifying  P  before  standard  motion 
analysis.  This  is  illustrated  by  Fig.  2c.  in  which  |Pl  is  shown, 
overlaid  by  a  lefluard-driftinggraiing  (hat  contributes  strongly 
to  it.  The  velocity  of  this  sinusoid  is  precisely  (he  velocity  of 
the  near-view  motion  of  P. 

There  are  other  transformations  aside  from  simple  full- 
wave  rectification  (hat  might  expose  the  near-view  motion  of 
P  to  standard  analysis.  The  most  likely  transformations  (31, 
32,  34)  involve  an  initial  stage  of  temporal  linear  filtering 
Plausible  candidates  are  filters  whose  response  at  every  point 
(x.y)  in  space  depends  on  (0  average  recent  stimulus  contrast 
at  that  point  and/or  (ii)  recent  changes  in  contrast  at  that 
point.  In  particular,  the  likely  temporal  filters  are  marked  by 
brief  impulse  responses  (most  of  their  energy  confined  (0 
<100  msec)  that  (0  integrate  to  a  nonzero  value  ($0  as  to 
reflect  raw  stimulus  contrast)  and/or  (I'O  are  biphasic  (so  as 
to  register  quick  changes  in  contrast).  Some  candidate 
impulse  responses  are  plotted  in  Fig.  3,  iK. 

What  distinguishes  the  leftward-stepping,  contrast-revers- 
inggraling  Pfrom  other  stimuli  that  reverse  direction  of  motion 
with  viewing  distance  (34)  is  (hat,  for  all  of  these  empincally 
plausible  temporal  linear  filters/*  (e.g.,  with  impulse  response 
/ conforming  to  Fig  3  a,  b,  or  c),  the  result  of  half-wave 
icctif^ng  /*  P  is  completely  ambiguous  in  motion  content 

Half-wave  ambiguity  of  P  and  its  transformations  is  illus¬ 
trated  in  Rg.  3.  The  filler  g*,  whose  impulse  response  g  is 
shown  in  Fig  3a,  is  a  physiolopcally  plausible  representation 
of  the  identity  transformation,  g*  averages  recent  contrast 
but  does  not  register  sudden  changes  in  contrast.  The  filter 
il*,  whose  impulse  response  k  is  shown  in  Fig.  3(,  is  a 
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Fic  3  Exposing  the  near- view  motion  of  f  to  standard  analysts. 
The  vertical  dimension  in  all  panels  is  time  /.  runrcng  dots  nuard.  The 
scale  of  I  IS  constant  throughout  the  figure  (see  Ulow)  Each  panel 
in  the  first  row  represents  the  impulse  respcnse/of  a  tei^ooral  filter 
that  IS  an  cmpincaHy  plausible  initial  stage  of  a  rectifying,  second- 
order  motion  mechanism.  The  horizontal  axis  of  each  panel  In  the 
first  row  indicates  intensity,  increasing  lefl-to-right.  (o)  Impulse 
response  of  a  physiologically  plausible  approximation  to  the  tempo¬ 
ral  identity,  it  averages  recent  stimulus  contrast  (c)  A  physiologi¬ 
cally  plausible  approximation  to  a  temporal  difTereniiator.  it  re¬ 
sponds  only  to  temporal  changes  in  contrast.  IJb)  The  average  of 
responses  of  filters  a  and  c,  a  physiolo^'cally  plausible  compromise 
betueen  temporal  difTereniiator  and  identity  that  indicates  both 
recent  changes  in  contrast  and  recent  average  contrast.  The  panels 
(al-c4)zfe  xt  cross-sections,  the  horizontal  axes  indicate  honzonul 
space,  the  vertical  axes  indicate  lime.  Each  grey  panel  spans  2  4*  of 
visual  angle  horizontally  at  a  viessing  distance  of  2  m  and  spans  ^33 
msec  (vertically).  In  the  row/ •  r  and  the  column  under  each  impulse 
response  is  axr  cross-section  of  the  result  of  applying  filter/lo  F  (Fig. 
2fl)  Subsequent  rows  indicate  the  result  of  rectifying/*  f.  //*(/ • 
n  and  //*(/  *  n  indicate  the  positive  and  negative  half-wave 
components  of  the  same-column  linear  transformation,  and  the  row 
marked  1/  *  H  shows  full-wave  rectifications  of  these  temporal 
filterings  of  P  All  half-wave  components  are  ambiguous  in  motion 
content;  all  full-wave  rectifications  yield  unambiguous  leftward 
motion  to  standard  analysis. 


physiologically  plausible  approximation  lo  a  temporal  differ- 
cntialor  (A*  registers  temporal  changes  in  conlrast,  without 
keeping  track  of  average  recent  contrast).  The  best-of-both* 
worlds  filter  m*  has  impulse  rcsrwnse  «  « (g  +  A)/2  shown 
in  Fig  3b  The  reason  for  including  this  bcsl-of-boih*worIds 
filter  is  that  among  the  stimuli  that  display  second-order 
motion  mediated  by  temporal  filtering,  there  are  some  for 
which  g*  (Fig.  3fl)  works  but  not  k*  (Rg.  3c),  and  some  for 
which  k*  (Fig.'3c)  works  but  not  g*  (Fig.  3a);  however,  m* 
(Fig.  3b)  works  for  ail  (32). 


In  FJg.  3,  the  top  row  ofx/  cross-sections  (marked/ *  P) 
display's  the  result  of  applying  each  of  the  filters  directly  to  F. 
The  rows  marked  H*(f  *  P)  (Fig.  3  a2,  b2,  and  c2)  and  H~{f 
*  0  (Hg.  3  aS,  bS,  and  cS)  display  the  positive  and  negative 
half-w^ve  components  of  the  same-column,  filtered  outputs 
(Fig.  3  a),  bl»  and  cj),  and  the  row'  marked  |/*  P]  displays 
full-wave  rectifications  (Hg.  3  a4,  b^.'and  c4)  of  the  filter 
outputs.  The  important  fact  graphically  illustrated  here  is  that 
the  b^f-wavc  components  of  ail  of  these  linear  transforma¬ 
tions  of  r  are  completely  ambiguous  in  motion  content.  As 
Fig.  3^.  bd,  and  c4  make  clear,  full-wave  rectification  u  orics 
to  expose  the  ne^-vrew  motion  of  F;  however,  almost  any 
full-w:ave-]ike  rectification  that  combines  same-sign  output 
for  positive  and  negative  signal  components  will  also  work. 

The  dist^ce-driven  reversal  of  the  apparent  motion  dis¬ 
play  ed  by  the  left  ward-stepping,  contrast-reversing  grating  F 
(Hg.  2a)  makes  it  dramatically  clear  that,  as  many  have 
observed  (2, 16,  30-38),  the  visual  system  extracts  motion 
information  from  the  visual  signal  in  more  than  one  way.  Fig. 
2  b  and  c  illustrate  that  the  far- view'  motion  of  F  is  consonant 
with  a  first-order  mechanism  (t.e.,  a  Fourier  mechanism  that 
applies  some  form  of  standard  motion  analysis  directly  to  the 
untransformed  stimulus),  whereas  the  near-view  motion  of  F 
implicates  a  second-order  mechanism  that  applies  standard 
motion  analysis  lo  a  rectified  transformation  of  P  (e.g.,  in. 
Fig.  2c).  In  the  context  of  the  various  stimuli  we  have  been 
able  to  create,  the  motion  of  which  reverses  with  distance 
(33),  the  specific  importance  of  P  derives  fiom  the  fact  that 
the  near-view  motion  of  P  cannot  be  exposed  to  standard 
motion  analysis  by  any  of  the  empirically  plausible  linear 
fillers  followed  by  half-wave  rectification,  whereas  full-wave 
rectification  works  in  conjunction  with  all  the  plausible 
fillers. 

It  is  possible  to  construct  stimuli  the  motion  of  which  is 
accessible  neither  to  first-order  mechanisms  nor  to  any  of  the 
second-order  mechanisms  considered  here  (32).  The  question 
remains  open  as  lo  whether  any  of  the  mechanisms  that 
detect  these  other  sorts  of  motion  use  half-wave  rectification. 
However,  the  leftward-stepping,  contrast-reversing  grating  P 
conclusively  establishes  that  at  least  one  second-order  mech¬ 
anism  uses  full-w'ave  rectification. 

This  work  was  supported  by  Air  Force  Office  of  Scientific 
Stesearcb,  Life  Sciences  Directorate,  Vision  Information  Processing 
Program,  Grants  85-0364  and  88-0140. 
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APPARENT  MOTION  DERIVED  FROM  SPATIAL  TEXTURE 
Georee  Snerlinv  and  Charles  Chubb.  Human  Infonnation  Processing 
Laboratory,  New  York  University,  New  York,  NY  10003. 

Texture  quilts  are  dynamic  stimuli  designed  for  studying 
motion-from-spatial-texture  without  contamination  by  motion 
mechanisms  sensitive  to  other  aspects  of  the  signal.  Here  we  provide 
a  theoretical  foundation  and  concrete  stimulus-construction  methods. 
We  demonstrate  texture  quilts  that  exhibit  strong  apparent  movement 
but  whose  motion  content  is  uriavajlable  to  standard  motion  analysis 
such  as  might  be  accomplished  by  an  Adelson/Bergen  motion-ertergy 
analyzer,  a  Watson/Ahumada  motion  sensor,  or  by  any  Reichardt 
detector.  Funhetmore,  the  following  transformations  leave  the 
motion  in  texwre  quilts  unavailable  to  standard  motion  analysis:  (a) 
any  linear  spke-time  separable  transformation  or  (b)  any  purely 
temporal  transformation,  no  matter  how  nonlinear  (e.g.,  rectifying  a 
temporal  derivative).  Applying  (a)  or  (b)  to  a  texntre  quilt  results  in  a 
spatiotemporal  function  P  (not  necessarily  a  texture  quilt)  that  is  again 
microbaliiced-that  is,  its  motion  is  unavailable  to  standard  motion 
analysis.  The  simplest  mechanism  sufficient  to  sense  the  motion 
exhibited  by  texture  quilts  consists  of  three  successive  stages:  (i)  a 
putely  spatial  linear  filter  (the  "texture  grabber"),  (ii)  a  rectifier  to 
transfemt  regions  of  high-energy  filler  response  into  regions  of  high 
average  value,  and  (iii)  standard  motion  analysis.  Stimuli  of  a  still 
higher  order  that  require  a  re-iteration  of  stages  (i)  and  (ii)  to  yield 
motion  will  be  demonstrated. 
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Erratum 

The  wrong  abstract  was  printed  on  page  161,  no.  11  of  the  March,  1989  Vol.  30,  No.  3  Supplement  to 
Investigative  Ophthalmology  and  Visual  Science,  The  correct  abstract  was  actually  presented  and  is  printed 
below. 

TEXTURE  INTERACTIONS  DETERMINE  APPARENT  UCHTNESS 
ChqrUs  Chubb,  Georje  Spilling,  and  j0$huoA.$ohrwn. 
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We  demonstrate  that  for  a  lest  patch  of  binary  spatial  noise  P 
embedded  in  a  surrounding  noise  field  5,  the  perceived  contrast  of  P 
depends  substantially  cn  the  contrast  of  the  noise  surround  S.  When 
P  is  surrounded  by  high>contrast  noise,  its  bright  points  appear 
dimmer,  and  simultaneously,  its  daric  points  appear  less  dark  than 
when  P  is  surrounded  by  a  uniform  field,  even  though  local  mean 
luminance  is  kept  constant  across  all  displays.  Sinusoidally 
modulating  the  contrast  of  the  noise  surround  S  causes  the 
apparent  contrast  of  P  to  modulate  in  antiphase  to  In  a  nulling 
experiment.  Cs  was  modulated  between  0  and  1  at  0.47  Hz.  For  noise 
patches  P  of  mean  con*/ast  between  OJ  and  OJ,  the  amplitude  of 
the  induced  modulation  of  P's  apparent  contrast  was  on  the  order  of 
0.45Cf.  By  comparison,  when  the  noises  in  f  and  5,  respectively,  are 
filtered  into  nonoverlapping,  ociave^wide  spatial  frequency  bands,  the 
modulation  of  Cj  has  very  little  effect  on  the  apparent  contrast  of  P. 

These  results  suggest  that  the  perceived  lightness  or  darkness  of  a 
p>oint  in  space  depends  on  the  combined  responses  of  multiple  units  at 
that  point,  where  each  unit  is  tuned  to  a  specific  band  of  spatial 
frequencies,  and  the  response  of  each  unit  is  normalized  relative  to  the 
nsponsts  of  nearby  units  of  the  same  type. 
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Abstract 

Micrcbaloncid  sumuli  are  dynamic  displays  which  do  not 
stimulate  motion  mechanisms  that  apply  standard  (Fourier- 
energy  or  autocorrelational)  motion  analysis  directly  to  tlie 
visual  signal.  Because  they  bypass  such  first-order 
mechanisms,  microbalanced  sumuH  are  uniquely  useful  for 
studying  second-order' motion  perception  (mouon  perception 
served  by  mechanisms  that  require  a  grossly  nonlinear  stimulus 
transformation  prior  to  standard  motion  analysis).  Some 
stimuli  are  mIcrobaIanc<d  under  all  poinOAtse  stimulus 
transformations  and  therefore  arc  immune  to  early  visual 
nonlineanues.  We  use  them  to  disable  motion  information 
denved  from  spatial  (temporal)  filtering  in  order  to  isolate  the 
temporal  (spatial)  properties  of  space/time  separable  second- 
order  motion  mechanisms.  The  motion  of  all  of  the 
microbalanced  stimuli  we  consider  can  be  extracted  by  (la) 
b.md-$elecuve  spatial  riliering  and  (lb)  biphasic  temporal 
hlienng,  nonzero  m  dc.  followed  by  (2;  a  rectifying 
nonlinearity  and  (3)  standard  motion  analysis. 


1,  Introduction. 

Standard  motion  analysis,  A  visual  display  is  described 
by  L(x,y.r),  its  luminance  as  a  function  of  space,  x,y,  and 
umc,  r.  We  use  the  term  standard  motion  analysis  for  any 
computation  applied  to  L  that  derives  L's  motion  from 
correlations  of  L-vaiucs  across  time  and  spacc^  Such 
computations  ore  consonant  with  the  motionfrom-Fourier' 
componenis  principle,  which  states  that  L's  motion  is  reflecied 
in  some  reasonable  way  by  the  conmbuiions  to  L  of  individual 
founer  components  (dnfiing  sinusoidal  gratings),  The 
recently  proposed  motion-perception  theones  of  Adelson  & 
Bergen  (1).  Heeger  [S],  van  Sanien  &  Sperling  (3,4],  and 
Watson  &  Ahumada  (2]  all  perform  various  forms  of  standard 
motion  analysis  on  their  input.  Similarly,  the  computer  vision 
models  of  Anandan  (9J  and  Waxman  &  Bcrgholm  (10]  also 
perform  standard  motion  analysis  on  the  input  signal. 

First-order  mcc  ianisms.  A  fundamental  transformation 
generally  presumed  to  be  subjected  to  standard  motion  analysis 
in  human  visual  processing  is  the  contrast  of  the  signal  (the 
nonnalized  deviation  of  luminance  from  ns  locally  computed 
mean).  We  call  mechanisms  first-order  that  apply  standard 
motion-analysis  to  raw  stimulus  contrast.  Any  motion 
mechanism  that  applies  a  grossly  nonlinear  transformation  to 


the  stimulus  prior  to  standard  motion  analysis,  we  call  second- 
order. 

It  is  becoming  clear,  from  apparently  moving  stimuli 
which  do  not  stimulate  standard  motion  detectors,  that  rirst- 
order  mechanisms  cannot  account  for  all  the  data  [11-28).  In 
particular,  Chubb  and  Sperling  (24,26,27)  have  demonstrated  a 
vancty  of  stimuli  which  display  consistent,  unambiguous 
apparent -motion,  yet  which  do  not  systematically  stimulate 
first-order  mechanisms. 

Ihc  methods  used  by  Chubb  &  Sperling  [26j  to  consDiict 
apparent  mouon  stimuli  devoid  of  systematic  first-order  motion 
content  are  founded  6n  the  notion  of  a  microbalanced  random 
sumulus.  A  random  stimulus  /  is  microbalanced  iff,  for  any 
space/time  separable  function  W,  the  result  J»WJ  of 
multiplying  /  by  w  satishes  the  following  condiiion.  (/  is  drift 
balanced^  the  expected  power  in  J  of  any  given  drifting 
sinusoidal  grating  is  equal  to  the  expected  power  in  J  of  the 
grating  of  the  same  spatial  frequency,  dnfiing  at  the  same  rate, 
but  in  the  opposite  direction.  Dnfi-balanced  and 
microbalanced  random  stimuli  are  useful  for  studying  motion 
perception  because  they  provide  flexible  access  to  second- 
order  nK>tlon  mechanisms  without  systematically  engaging 
first-order  mechanisms. 

In  this  paper,  we  begin  by  reviewing  the  basic  results 
about  drift-balanced  and  microlulanced  random  stimuli,  then 
apply  these  findings  to  generate  a  collection  of  microbalanced 
stimuli  displaying  various  types  of  motion  The  motion  of 
each  of  the  stimuli  we  consider  is  best  revealed  to  standard 
analysis  by  a  spaceltlme  separable  linear  filter  followed  by  a 
rectifier.  The  first  two  microbalanced  stimuli  we  discuss 
(stimuli  3.1  and  3.2)  place  impononi  constraints  on  the 
temporal  fihenng  mediating  space/time  separable,  second- 
order  mouon-pcrcepiion.  The  motion  of  each  of  the  last  four 
stimuli  (4.2.2.  4  2.3,  4.2.5,  and  4.2  6)  depends  only  on  the 
spatial  filtering  stage  (lemporul  filtering  ^one.  followed  by 
rectification,  cannot  expose  the  motion  of  these  stimuli) 

A  transformation  is  poinmse  if  its  output  value  at  a  point 
(x.y.O  m  space/iime  depends  only  on  the  value  of  the  input  at 
Poiniwise  transformations  include  what  are  often 
called  "static  nonlineoniies."  Stimuli  3  2, 4  2  2, 4.2  3. 4  2  S  and 
4.2  6  all  remain  microbalanced  after  arbitrary  point\Mse 
transformations.  We  present  general  methods  for  constructing 
stimuli  of  this  sorr. 
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A  iraniformation  is  purely  temporal  if  ii^routpul  v^ue  at 
a  point  (t.y .  r)  dep-^  onlyron  the  history  of  input  at  (i,y ) 
TIjc  class  of  purely  tem'i^ral  transfoimaiions  is  very,  genial 
and  includes,  for  example,  .teinpora!  bandpass  filtering 
preceded  and  followed  by  a^itrary  pointwise  tmsforznations. 
Stimuli  4.2.2,  4.2.3,  4.2.5  and  4.2.6  remain  nucrobalanced 
-after  any  purely  temporal  tramformation.  Such  stimuli  arc 
extremely  useLl  for  investigating  second^order  moupn 
perception,  because  they  provide  a  critical  m^sure*  of  control 
in  difrerentially  stimulating  specific  second*order  mechanisms. 
Indeed,  under  virtually  all  mt^els  of  visual  processing,  the 
effective  transformation  mediating  the  perceptlori  of  motion 
displayed  by  such  stimuli  is  bound  to  be  a  spatial  linear  filter 
(a  "texiure^grabbcri').  This  linear  stage  must,  of  course,  be 
followed  by  a  pointwise  nonlinearity  (such  as  rectification  or 
thresholding)  to  expose  the  microbalanccd  stimulus  motion  to 
standard  analysis. 

2.  Preliminaries. 

Section  outline.  In  this  section  we  state  the  background  f^is 
presupposed  by  the  main  discussion  of  the  paper.  The  broad 
topics  covered  are: 

•  Real'valued,  discrete  visual  stimuli  and  their  Fourier 
transforms.  We  take  a  stimulus  to  be  a  real.valucd  function 
whose  action  is  restricted  to  a  finite  grid  of  spatiotemporal 
sampling  locations. 

•  Transformations.  Definitions  are  given  of  linear  shift* 
invariant  transfonnations,  and  pointwise  transformations. 

•Jtandum  stimuli.  A  random  stimulus  is  a  jointly  distributed 
set  of  random  variables  assigned  to  a  grid  of  spatiotemporal 
sampling  locations. 

•  Drifl'balanced  and  microbalanced  random  stimuli.  A 
random  stimulus  /  is  drift  balanced  iff  the  expected  power 
contributed  to  /  by  any  given  Fourier  component  (drifting- 
sinusoidal  grating)  is  equal  to  the  expected  power  in  1  of  the 
grating  of  the  same  spatial  frequency  drifting  at  the  same  rate, 
but  in  the  opposite  direction.  /  is  microbalanced  iff  Wl  is  drift 
balanced  for  any  space/itme  separable  function  W  that 
"windows"  /,  The  class  of  micro^Ianced  random  stimuli  is 
significant  for  studying  motion-perception,  since  (i)  it  is  easy 
to  construct  a  broad  range  of  microbalanced  random  stimuli 
which  display  consistent,  compelling  apparent  motion  across 
independent  realizations,  despite  the  fact  that  (ii)  the  motion 
displayed  by  any  microbalanced  random  sfimulus  is  invisible 
to  first-order  mechanisms,  regardless  of  the  spatiotemporal 
scope  over  which  they  perform  their  motion-analysis. 

u/  whhg  by/  •  g.  and  the  product  of/  wiihg  by/g. 

2.1.  Discrete  dynamic  visual  stimuli  and  their  Fourier 
transfurms. 

We  let  R  denote  the  real  numbers,  and  Z  (Z*)  the  integers 
(positive  integers). 


Contrast  modulation.  Luminance  /(x.y.r)  is 
physically  consirainted  .to  bc'  a,  non-negative  quantity. 
Fsychophysically,  the  significant  quantity  is  contrast,  the 
nomialized  deviation  at  each  t^e  t  of  luminance  at  each  point 
(x,y)  in  visual  field  from a  "tockground  level",  or  "level 
of  adaptation",  which  reflects  the  average  luminance  over 
points  proximal  to  (x.y,r)  in  space  and  time.  We  shall  restrict 
our  attention  throughout  this  paper  to  stimuli  for  which  it  can 
be  assumed  that  the  background  luminance  level  4,  is  uniform 
over  the  significant  spatiotemporal  locations  in  the  display. 

Kbr  any  stimulus ^f  with  base  luminance  f^.  call  the 
function /.satisfying 

I  =  /„(!+/), 

Hat  contrast  modulator  of  /  (and  note  that  /  ^-1). 

Psychophysically,  it  is  well-established  that  over 
substantial  ranges  of  /«,  the  apparent  motion  of  /  does  not 
depend  upon  /».  Therefore,  we  ^ft  our  focus  from  luminance 
to  contrast,  and  identify  a  stimulus  with  iu  contrast  modulator, 
dropping  reference  to  l^ckground  level. 

Stimuli  We  restrict  ourselves  to  discrete  stimuli,  whose 
activity  is  restricted  to  a  finite  grid  of  points  in  space/time, 
Specifically,  we  call  any  function  a  stimulus  iff 

/{jf»y»0  **  for  all  but  finitely  many  points  of  Z\  We  shall 
be  considering  stimuli  as  functions  of  two  spatial  dimensions 
and  time.  The  reader  may  find  it  convenient  to  think  of  the 
first  spatial  dimension  (always  indexed  by  x)  as  horizontal, 
with  values  increasing  to  the  right,  ^e  second  spatial 
dimension  (indexed  by  y)  as  venic^,  with  values  increasing 
upward.  Ttie  temporal  dimension  is  indexed  by  r.  For 
concreteness,  the  reader  is  encouraged  to  imagine  2?  as 
indexing  the  pixels  in  a  dynamic  digital  display. 

Because  any  stimulus  /  Is  nonzero  at  only  a  finite  number 
of  points,  the  power  in  /  is  finite,  from  which  we  observe  that  / 
has  a  well-defined  Fourier  transform. 

Wc  denote  /  *$  Fourier  transform  by  / : 

/(<o.e.T)=  s 

x.j'.l  •  Z 

Although  T  is  defined  for  all  real  numbers  it  is 
pcnodic  over  2it  in  each  variable.  This  fact  is  reflected  in  the 
inverse  transform: 

2n  2n  2k 

/[x.y.t]  =  I 

In  the  Founer  domain,  to  indexes  frequencies  relative  to  x .  6 
indexes  frequencies  relative  to  y ,  and  t  indexes  frequencies 
jtlaiivctor. 

We  distinguish  the  stimulus  0  by  setting  O/x  ,y ,  r  1  *  0  for 
allx.y,r  e  Z. 

Any  stimulus  /  is  called  space/time  separable  if 
/(x.y.t)  s  g(x,y]/i[r).  for  some  real-valued  function;  g 
and  of  space  and  time  respectively. 
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12i  Transfonnatio^ 

Any  function  T'  which  takes , the  set  of  it^*valued 
functions  of  into  itself  is  called  a  transformation.  If,  tor 
instance,  /:Z^-»R  then  T(/):Z^-4R.  and  Vc  write 
Ti!)lx,ytti  to' indicate  the  value  of  r(/)'at  any  point 
(<.y*<)€  Z^’  We  shall  be  particularly  concerned  with  two 
types  of  transformations:  linear  shift>invariant  transfonnadons, 
and  pclnnvise  transftmnatiohs. 

Pointwise  tr^formationS)  rectifiers.  For  any  functions 
and  fi  *4  0,  the  composition  g*f:A  C  is 

^ivcn  by 

S*/(a)  s=  (a)) 

for  any  ae  A.  Then  for  any  /:R->R,  we  call  the 
transformation  / «,  yielding  the  spatiotemporal  function  /•/ 
when  applied  to  sdmulus  /.  a  pointwise  transformation 
(because  its  output  value  at  any  point  in  space/dme  depends 
only  on  its  input  value  at  that  point).  The  transformadon /•  is 
called  a  positive  halfwave  rectifier  if  /  is  monotonicalty 
increasing,  and / (v ] a 0  for  all  v  ^ 0.  /sis  called  a  negative 
halfwave  rectifier  if  /  is  monoionically  decreasing,  and 
/  (v )  a  0  for  V  i  0.  Finally,  /  •  is  called  a  full-nave  rectifier  if 
/  is  a  monotonically  increasing  funcdon  of  absolute  value. 

Linear)  ahiftrintariant  Iransformations.  Linear,  shift* 
invariant  (LSI)  transformations  are  spatiotemporal 
convoludons:  For  -*»  R  (the  impulse  response),  the  LSI 
transformation  k*  yields  the  convolution  k  «  /  when  applied 
to  any  stimulus  / :  i.e.,  for  any  a  c  Z^, 

kWlal  «  £  /lPlJfc(a-pl. 

(kZ» 

2.3.  Random  stimuli. 

The  notion  of  a  random  stimulus  generalizes  that  of  a 
nonrandom  stimulus  in  that  the  values  assigned  points  in 
space/tune  by  a  random  stimulus  are  random  variables  rather 
than  constants.  A  random  stimulus  is  a  family 
//?U.yir)l  A:,y,r  o  Z)  of  random  variables,  all  but  some 
finite  number  of  which  are  always  0.  To  ensure  that  R  has  a 
well'dcfined  expected  power  spectrum  wc  require  that 
R  [x ,  y ,  ( ]  has  a  Anile  second  moment  for  each  (x ,  y ,  r )  e  Z^: 

2.3.1.  Call  any  family  {R{x,y,t]\  (x,y.r)€  Z^}  of  jointly 
distributed  random  variables  a  ran^m  stimulus  provided 

(i)  for  all  but  finitely  many  (x,y,/)6  z\  /?(x.y,r]  is 
invariably  equal  to  0. 
and 

(lOE^Rlx.y.rl^j  exists  for  all  (x.y,/)e  Z^ 

As  with  non*random  stimuli,  we  write  R  for  the  Fourier 
transfono  of  the  random  stimulus  /i.  f?  is  called  spaceltime 
separable  iff  R  is  spaccAime  separabh  with  probabil.iy  1.  If 
there  exists  a  stimulus  S  such  that  R  s5  with  probability  1, 
then/?  is  called  consranr. 


.  2.4.  Drift*balanced  and  microbalaiiced  random  stimuli. 

The  motipn-from-Fourier-components  principle  is  a 
com^nly  encounter^ '  role  of  ‘  thumb'  for  predicting ,  the' 
appaim  rooiioh  of  an  arbitrary  stimulus  I{x,y,i}sf[x,t] 
t^t  is  constant  in  the  vertical  dimenrion  of  space.  It  states 
that,  for  /  considered  as  a  linear  combinauon  of- ^fting 
sinusoidal  grating^  if  the  power  in  /  of  the  rightward*drifiing 
gratings  is  greater  than  the  power  of  the' leftward’drifting 
gratings,  then  apparent  motion  should  ■  be  to  the  - right. 
Conversely,  if  most  of  /"s  power  resides  in  the  leftward* 
drifting  gratings,  apparent  motion  should  be  to  the  left. 
Otherwise  /  should  manifest  no  decisive  motion  in  either 
direction. 

Ihis  prediction  role  for  horizontally  moving  stimuli  is  a 
restricted  version  of  the  more  general  motion-frorri-Fourier- 
components  principle:  For  any  stimulus  L  to  exltitlt  motion  in 
a  certain  direction  in  the  neighborhood  of  some  point 
<x,y,r)6  Z^  there  must  be  some  spatiotemporal  volume  A 
proximal ‘to  (x,y,r)  such  that  the  Fourier  transform  of  L 
computed  locally  across  A  has  substantial  power  over  some 
regions  of  the  frequency  domain  whose  points  correspond,  in 
the  space/time  domain,  to  sinusoidal  gratings  drifting  in  a 
direction  consistent  with  the  ntoiion  perceived. 

The  following  class  of  random  stimuli  provides  a  rich 
pool  of  counterexamples  to  the  moiion*ffom‘Fourier- 
components  principle  (26). 

2.4.1.  Call  any  random  stimulus/?  drift  balanced  \t( 

£[i«  (B,  e,  t)i’]  =  f  [iR  (o,  e.  -T)i^] 

for  all  (w,0, -1)6  R^ 

Thus,  a  random  stimulus  R  is  drift  balanced  iff  the 
expected  power  in  R  of  each  drifting  sinusoidal  component  is 
equal  to  the  expected  power  of  the  component  of  the  same 
spatial  frequency,  drifting  at  the  same  rate,  but  m  the  opposite 
direction.  Tliat  is.  that  expected  power  of  every  frequency  is 
the  same,  independently  of  whether  a  series  of  frantes  is 
displayed  in  forward  or  reverse  order.  Obviously,  for  any  class 
of  spatiotemporal  receptors  tuned  to  stimulus  power  in  a 
certain  spatiotemporal  frequency  band,  a  dnft>bulanced 
random  atunulus  will,  on  the  average,  stimulate  equally  well 
those  receptois  tuned  to  the  corresponding  band,  of  opposite 
temporal  o.rientanon. 

Microbalanced  Random  Stimuli.  Consider  the 
following  two*nash  stimulus  5:  In  flash  1,  a  bright  $poi(call  it 
Spot  1).  appears.  In  flash  2,  Spot  1  disappears,  and  tv^o  new 
spots  appear,  one  to  the  left  and  one  symmetrically  to  the  right 
of  Spot  1.  As  one  might  suppose,  is  drift  balanced.  On  the 
other  hand,  it  is  equally  clear  that  a  firsi'Order  motion  detector 
whose  spatial  reach  encompassed  the  location  of  Spot  1  and 
only  one  of  the  Spots  in  flash  2  might  well  be  stimulated  in  a 
fixed  direction  by  5.  Thus,  although  S  is  drift  balanced,  soma 
first-order  motion  detectors  may  be  stimulated  strongly  and 
systematically  by  S.  These  detectors  can  be  differentially 
selected  by  spatial  windowing,  and  thereby  the  drift-balanced 
Stimulus  5  can  be  converted  into  a  non-dnfl-balanced  stimulus 
by  multiplying  it  by  an  appropriate  space/time  separable 
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function.  This  property  is  escaped  by  the  following  subclass  of 
dfift'balanced  random  stimuli. 

2.4.2.  Qll  any  random  stimulus  I  microbalanced  iff  VW.'is 
drift  balanc^  for  any  spac^ilrw  sparable  (nomrandorh) 
function  W. 

Onc'can  think,  of  the- multiplying-  function  as  a 
"window"  through  which-a  spaiiotcmporal  subregion  of  I  can 
be -"viewed"  in  ,  isolation.'  The* spacetime,  separability  of  W 
insult. that  it  is  "^nsparent"  with  respect; to  the.motion* 
content. of-the  region  of  to  which  it  is  applied:  does  hot- 

dision  /'$  motion  with  any  nx>tion  content  of  its  own;  Thus, 
the  fact  that  I  is  microbalanced  means  that  any  subregion  of  f 
encountered  through  a  "motion^transparent  window"  is  drift 
balanced. 

-The  following  characterization,  of  the  class  of 
microbalanced  random  stimuli,  and  the  rest  of  the  results  in 
this  section  arc  from  Chubb  and  Sperling  {26). 

2.4.3.  A  random  itlmulus  I  is  microbo!anC€d  if  and  onljff 

/or<i«(z.y,i),(i'.y',(')6Z\ 

Some  other  relevant  facts  about  microbalanced  random 
stimuli: 

2.4.4.  For  nny  Indtpcndcnt  microbalanced  random  silmuH  I 
andJ, 

1.  the  product  IJ  is  microbalanced, 
and 

W,  the  convolution  1  *  J  is  microbalanced. 


2.4.5.  (a)  Any  spacetume  separable  random  stimulus  is 
mierobalaneed;  (b)  any  cotistant  microbalanced  random 
stimulus  is  space/time  separable. 

The  following  result  is  useful  in  constructing  a  wide  range 
of  microbalanced  random  stimuli  which  display  striking 
apparent  motion. 

2.4.6.  Let  r  be  a  family  of  pairwise  independent, 
microbalanced  random  stimuli,  all  but  at  most  one  of  which 
have  expectation  0.  Then  any  linear  combination  of  T  Is 
microbalanced. 

The  Reichardt  detector  characterization  of 
microbalanced  random  stimuli.  Two  hrst'Onder  motion 
detectors  proposed  for  psychophysical  data  (1,6}  can  be  recast 
as  variants  of  a  Reichardt  Detector  (3.4,31],  The  Reichardt 
detector  has  many  useful  properties  as  a  motion  detector 
without  regard  to  its  specific  instantiation  (3,4], 

Figure  1  shows  a  diagram  of  the  Reichardt  detector.  'Rie 
Reichardt  detector  consists  of  a  left  and  a  right  subunit  that 
share  their  inputs.  The  left  subunit  normally  computes 
leftwani  motion  because  the  filter  g  |«  acts  as  an  internal  delay 


to  match  the  external  delay  of  a  moving  sumulus.  The  nglit 
subunit  normally  wmputes  rightward  motion.  ,Thc  output 
represents  tlw  smoothed  leftwa^  minus  rightwWdiffcrcnce. 


Figure  1:  The  Reichardt  detector.  The  detector  consists  of  a 
left  and  a  subuiui:  the  led  unit  normally  detects  leftward 
movement:  the  right  unit,  rightward  movement,  in  response  to 
a  stimulus  /,  each  spatial  input  iilier.  (receptive  field)  /| 
outputs  a  tcmponil  function  Uut-is  then  convolved  with  a 
temporal  filter  g,  • .  The  convlator  boxes,  marked  “x",  oupui 
the  product  of  i^ir  Inputs.  The  box  marked  ***"  outputs  its 
left  input  minus  its  right:  this  output  indicates  the  net  leftward 
minus  rightward  motion,  The  tox  h*  contains  a  temporal 
smoothing  filter  to  produce  time^averaged  output. 

Specifically,  the  Reichardt  detector  consists  of  spatial 
receptors  characterized  by  spatial  window  functions  (receptive 
fields) /|  and/2,  temporal  filters  gj*  and  g2*,  multipliers,  a 
differencer,  and  another  temporal  filter  k*.  The  spatial 
receptors  ft,  f  » 1. 2.  act  on  the  input  stimulus  /  to  produce 
intermediate  outputs, 

>V!0  =  2  /ilJc.y)/(x.y,r). 

At  the  next  stage,  each  temporal  filler  g^*  transforms  its  input 
((ty~l»2).  yielding  four  temponl  output  functions: 
Sf  *  Th  left  and  right  multipliers  then  compute  the 
products 

y2*«2l'l]  [yi.S2(i!][y2‘Xil'l] 

respectively,  and  the  differencer  subtracts  the  output  from  the 
right  multiplier  from  that  of  the  left  multiplier: 

£>(0  = 

[y\*  y2*«2lo]  -  [yi*«2Ul][y2*«iio] 
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(a)  :L[x,y,t^  -i 


rwiifier 


Figure  2:  First-order  arul  seeord-order  motion  meehanisms,  <a>  Fitst*order  motion  mechanics  apply  standard  motion 
analysis  (c  g ,  Reichardt  model)  direcUy  to  the  luminance  signal  L,  Many  second'Order  mechanisms  can  be'modcicd  by  a 
signal  (ransrurmailon  compris^  of  a  spaiioicmporal  linear  filler  .followed  by  a  poiniwise  notilincaniy  followed  by 
standard  motion  analysis.  *nic  rdlenng  performed  in  (b>  is  spaceltime  separable  (spatial  filtering  and  temporal  filtering 
occur  in  separate  boxes),  followed  by  a  poiniwise  nonlinearity,  which  is  Ulusujicd  here  with  a  fulhwave  reciilicr.  The 
moiion  of  all  ihe  microbalanced  stimuli  considered  in  ihls  paper  can  be  extracted  by  the  sccond>ordcr  mechanisms 
diagrammed  In  <b)  with  appropriately  cho^n  spatial  and  temporal  filters. 


The  final  output  is  produced  by-applying. ^e  filter  whose 

purpose  is  to  appropriately  smooth  the,  (ime*varyihg, 
differcncer  output  Z).  Since  almost  all  fir$i*order  mechanisms 
can  be  expressed  as,  or  closely  approximated  by  Rcichardt 
detectors,  the  following  result  127]  is  the  cornerstone  of  the 
claim  that  microb^anced  random  stimuli  b)'pa$s  first*order 
motion  mechanisms. 

2.4.7.  For  any  random  stimulus  I ,  the  following  conditions  are 
equivalent: 

(a)  I  is  microbahneed. 

(b)  The  expected  response  of  any  Reichardt  detector  to  I  is 
0  at  every  instant  in  time. 

Varieties  of  microbalanced  moiion. 

In  Secuons  3  and  4,  we  desenbe  six  random  stimuli,  all  of 
which  arc  microbalanced,  yet  6splay  consistent  apparent 
motion  across  independent  realizations.  For  each  of  these 
random  stimuli  /.  the  motion  displayed  by  /  can  be  exposed  to 


Standard  mouon  analysis  by  a  transformation 

r(/)  =  r.(r./)  .  (1) 

where  r  •  is  a  rectifier,  and  /•  is  a  spacc/iimc  separable  filter. 


3.  Motion  mediated  by  simple  rectification  and  by 
temporal  differentiation  followed  by  rectification. 

The  first  two  stimuli  (3.1  arid  3.2)  place  constraints  only 
on  the  temporal  component  of  the  the  filter  /• .  Subsequent 
Stimuli  focus  on  the  spatial  component, 

3.1.  Stimulus:  The  amplilude*moduIa(ingsquarewave.  The 
mouon  of  some  of  the  microbalanced  stimuli  demonstrated  by 
Chubb  &  Sperling  (24,26)  results  from  modulating  the 
amplitude  of  spatially  independent,  visual  noise.  For  example. 
Fig.  3a  shows  anar  crosS'Secuon  of  n  squarewave,  stepping  1/4 
spatial'cycle  leftward  each  frame,  modulating  (between  0  and 
1)  the  amplitude  of  a  row  of  static,  horizontally  independent 
biack/white  vertical  bars.  ITiis  stimulus  displays  obvious 
leftward  motion  to  all  viewers  under  a  broad  range  of  viewing 
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Figure  3:  ‘transformations  of  the  contrast'modulatin^jquarewave^  AU  12  panels  arc  xt  crosS'SecUons  (wiih  lime 
running  downwarO)  of  venous  iransfomaiions  of  stimulus  3.1,  the  eonsrasumoddatlns  squareivaye.  Stimulus  3.1  itself 
iS  crosS'Secflwd  in  (al.lYic  horiuinul  dimension  isx,  Uie  venical  dimension  is  t  with  time  increasing  dovmwanj,  and 
the  stimulus  is  unvarying  in  y .  The  problem  of  perceiving  leftward  motion  in  the  dynamic  display  whosexr  cross  section 
is  represented  by  panel  (a)  is  equivalent  to  the  texture  problem  of  perceiving  onentaiion  slanting  down  and  to  the  left  in 
the  panel  <a)  itself.  In  the  left*hand  column  arc  displayed  cross>sectioibof  3  linear  transformations  of  the  stimulus:  (a)  the 
ideiuiiy,  (b)  the  partial  derivative  with  respect  to  lime,  and  (c)  the  average  of  the  oper  ors  applied  in  (a)  and  (b).  The 
next  column  (al,  b1,  cl)  shows  the  result  of  fulbwave  rcctihcation  (absolute  value)  of  the  corresponding  (same^row) 
linear  iransfonnailons:  e.g..  (al)  shows  the  result  of  fuU'Wave  recufying  the  uniransformed  stimulus  3.1.  Column  three 
shows  the  positive  half*wave  components  of  the  same*row  linear  tran^oimations  in  column  1;  column  4  shows  the 
negative  half«wavc  components.  The  functions  in  coluntn  I  Qinear  iransformailons  of  the  comrast'modulaiing 
squarewave)  are  all  microbalanccd;  hence,  t^  right*todeh  motion  di$;daycd  by  the  stimulus  cannot  be  obuined  front 
these  tnubformaiions  by  standard  motion  analysis.  Temporal  diffei^atjon  (the  sccond^row  transformations)  yields 
motion«ambiguou$  functions;  rows  1  and  3  yield  functions  whose  motion  is  extractable  by'siandard  motion  analysis. 


conditions,  despite  the  fact  that. (as  is  easily  proven  from 
propositions  2.4.Sa  and  2.4.6)  it  is  microbalanc^. 

Simple  rectification  exposes  (he  motion  of  the 
anipliludc*mudulating  squarewave.  As  suggested  by  Figs. 
3al,  3a2,  and  3a3,  simple  fulhwavc  or'" half-wav^ rectification 
(i.e.  selling  /•  to  the  identity  in  Eq,  (1))  suffices  to  expose 
motion  carried  by  amplitudC'inodulation.  However,  simple 
rectification  fails  to  expose  thC'  rriobon  in  the  following 
stimulus. 

3.2.  Stimulus:  The  contrast«reyerslng  squarewave.  A 
sideways  stepping  squarewave  is  uied  to  alternately  multiply 
the  contrast  of  spatially  independent  noise  by  +l  and  -1.  Fig. 
4a  shows  an  xt  cross*sectiM  of  a  squarewave  that  steps 
leftward  J/4  spatial-cycle  at  regular  temporal  -intervals, 


reversing  the  contrast  of  black/white  venical  bars  as  it  moves, 
like  the  amplitude-modulating,  squarewave, .  this  contrast- 
reversing  squarewave  displays  vivid  leftward  motion  to  all 
viewers  under  a  broad  range  of  viewing  conditions; 
nonetheless,  it  is  microbalanccd  (another  easy  consequence  of 
propositions  2.4.5a  and  2.4.6). 

Simple  mtifrcation  fails  to  reveal  the  motion  of  the 
contrast-reversing  squarewave.  As  illustrated  in  Figs.  4al, 
4a2  and  4a3,  simple  rectification  does  not  expose  the  mouon  of 
the  contrast-reversing  squarewave  to  standard  motion  analysis; 
fuH'Vvave  rectification  yields  a  uniform  field,  while  half-wave 
rectification  yields  a  mere  dc-shifted  rescaling  of  the  original 
stimulus.  Indeed,  any  purely  spatial  filler  followed  by 
rectification  is  equally  inefTccttve  a:  revealing  this  motion  127] 
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Tiiwti.TrarjfimrjmorJtfihccowast-TCicrAitsV^c^'^^  Aa  IZpni^arcM 
d0«,wini)  of  various  tnnsfoimaioos  of  aindos  12.  fit 

sttBoncdinfa).  See«p(ionofRt3fora<!£scn>'oaofiheirmfoi=aooosandp=>djn»^o« 

S^leric»ymn(iUbwI«3foOTaOonsofLVcooiian-rca«wss?Jascwave)mBu»ba.aiictiI.sote^-^i 

displayed  by  Uk  sdraulas  can-ioi  be  obained  from  iheso  tra-TsfonnaiiofS  by  oaadafd  lo^  x^ya^T^Jy 
poimwisc  iraraforoiaions  (ihe  rcciitoikjos  shoswi  bs  ihc  «ist  row)  ^  jxM  oiiaobj^  fonaiTO  ^ 
lirapry-accessible  raoiioa.  Hovsover.  after  a.iy  of  the  rcciir>-.is  iransibnoaaons  amn(t>)ix  (c\  ife  simiJas  Doaoo  i$ 
aecess^bie  lo  siandvd  nuftion  analysis. 


Temporal  ditferenliation  foUosicd  by  retlincation 
reveals  the  motion  of  Ihe  conirasl-reversing  squarewavfc 
•n.e  obvious  nansfoimation  to  expose  Ihe  motion  of  this 
stimulus  10  standard  motion  analysis  is  temporal  differenaanan 
followed  by'half-wave  or  full-wave  reaificaiior.  The  result  of 
differeniaiing  the  contrast-reversing  squatewave  wth  respect 
to  liittc  is  shown  in  Fig.  4b.  The  motion  of  this  temporal 
derivative  remains  mlnobalanced  (a  consequence  of 
propositions  2.4.4  11.  and  2.4.5a).  However,  as  suggested 
Figs.  4bl,  4b2  and  4b3,  either  full-wave  (Fig.  4bl)  or  half¬ 
wave  (Rgs.  4b2  &  4b3)  rectilicalion  suffices  to  reveal  the 
motion, of  the  tOTporal  derivative  of  the  contrast-reversing 
squarewave  to  standard  analysis.  However, 

Temporal  differentiation  followed  by  rectification  fails 
to  expose  Ihe  motion  of  the  ampliludMnodulating 
squatewave.  L-iffetemiating  the  amphtude-moJulanng 
squarewave  (Fig.  3a)  with  respect  to  lime  sacrifices  all  the 
motion  content  of  this  stimulus  (See.  Fig.  3b).  The 
diffetemiaicd  stimulus  (Fig.  3b)  is  completely  ambiguous  in 
nioiion-cdnleni,  and  subsequent  uansfoimations  (eg.  full- or 
half-wave  rectificationi  Ftgs.  3bl.  3b2, 3b3)  cannot  reclaim  the 
original  stimulus  motion. 


To  recajuiubte;  The  motion  of  the  amphTude-modularing 
squarewave  ffig.  3a)  is  exposed  by  rimple  half-wave  or  fuU- 
wavereeti6cation(Fig$.3al.3a23a3).  However.  rectiScatioa 
fails  to  expose  the  morion  of  die  contrast-reversing  squarewave 
(signal  in  Fig.  4a;  tectifieitions  in  Figs.  4al,  4a2, 4a3).  On  the 
other  hand,  lempo.-al  diffcrenoation  followed  by  half-wave  or 
full-wave  recfificariou  suffices  to  expose  the  morion  of  the 
coBtrast-rcveising  squarewave  to  standard  analysis  (Hgfc  4b. 
4bl.  4b2  4b3),  but  fails  to  reveal  the  morion  of  the  amplitude- 
modularing squarewave  (Figs.  3b,  3bl,  3b2. 3b3). 

A  sin^e  trarjformarion  which  reveals  the  motion  of 
both  stimuli  3.1  and  32  to  standard  roorion-artalysis  can  easily 
be  obtained  by  letting  /•  of  Eq.  (1)  be  a  temporal  linea  biter 
(spatial  component  =  identity)  with  impulse  response  given  by 
Hg.5 

The  result  of  applying  such  a  filler  to  the  conirasi- 
modularing  squarewave  is  shown  in  Fig.  3c.  As  Figs.  3cl,  3c2, 
and  3c3  suggest,  fall-  or  half-wave  recaficarion  of  the  output 
(Fig.  3c)  exposes  the  morion  of  the  conirast-modulatuig  square 
to  standatd  analysis.  And  as  Figs,  -te,  4cl,  4c2  and  -led 
indteate,  the  same  transformations  expose  the  motion  of  the 
conirasi-ievening  squarewave  to  standard  analysis. 
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4.  Motion  carried  purdjr  fav  spalul  texture. 

*n>ers  arc  nxmy  mkrobabneed  random  stimuli  uhosc 
snoeioa  depends  on  the  ^noeasporal  modulation  of  spotin/ 
tcjsure.  The  most  obvioss  gaasfeasaaons  to  expose  texbually 
eom'Q'cd  motion  to  stardard  motion  analysis  are  psxa  by 
T(J)sftf  •  /X  UTtb  the  sparable  filter/*  being  purely 
spatial  (tempceal  component  s  identity).  The  spatial  filter/* 
should  be  \icwtd  as  a  *texture-pabber”.  /*  uiU  it^pood  uidi 
va^ing  pouer  throughout  regkos  of  the  visual  field, 
depending  on  ubetber  or  not  the  texture  to  which  it  is  tuned 
populates  those  repoos.  Hovk  e\  er,  the  ou^ui  of  a  linear  filter 
to  a  texture  is  positive  or  negative  acconling  to  the  phase  of  the 
texture.  That  is,  n:u]t>pl)ing  the  corozst  of  the  texture  by  *1 
will  muluply  the  filler's  output  by  -1.  Ihe  purpose  of 
reciificatian  is  to  report  the  presence  or  absence  of  texture, 
independent  of  phase.  The  result  T(/)  is  a  spauoiempoial 
funcDon  whose  value  rejects  the  movement  of  the 
(f*yitxiutt  across  the  visual  field  as  a  function  of  fime. 
Elaborations  of  this  scheme  have  been  applied  to  modeling 
texture  perception  by  Caelli  (32],  Bergen  &  Adelson  133J.  and 
Sutter,  Beck  &.  Graham  (34). 

To  study  texfurally  emveyed  motion,  it  is  imponant  to 
bypass  not  only  first-order  motion  mechanisms,  but  also 
irrelevant  second-order  mechanisms,  such'  as  the  temporal 
mechanisms  proposed  above  for  accessing  the  motion  of  the 
amplitude-  and  contrast-reversing  squarewaves-^stimuli  3.1 
and  3.2).  A  particular  subclass  of  microbalanced  random 
stimuli  serves  Jtis  purpose. 

4,1.  Random  stimuli  microbalanced  under  all  pointwise 
Iransformaliuns. 

Many  signal  transfomuiions  encountered  in  perceptual 
models  can  be  expressed  as  cascades  of  pointwise  (r*)  and 
spaee/iime  separable  LSI  transformations  (/*).  For  visual 
processing  that  is  limited  to  such  cascades,  the  following 
question  is  of  considerable  interest:  What  conditions  must  be 
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The  modoo  canisd  the  cocsast-fCSYTSisg  sqsaTcw-sve 
j  Is  expo^  by  rectifying  the  of  cenain  LSI 

traasfonaarioes  (e.g.  tbe  icmpocal  dsn'sthe)  of  J.  Howe\*er. 
CO  p^mrise  trassfoesaafioo  ^plied  by  itself  to  J  suffices  to 
expose  J*s  modoo  cwacnt  to  standard  analysis.  Irs  studj-ing 
tbe  processing  stages  that  second-order  roodoa 

pexoepuoc.'ii  nsy  bet^some  irpporta.oce  to  know  that  a  gi>en 
stinulus  h  *iramaae*  to  a  catdn  transfoenurioo  or  a  certain 
type  of  trassfonsaiioQ  (as  J  is  immsne  to  p(^twise 
oansformadoos).  This  mothates  tbe  fi^Iowisg  codon  (27): 

4.1.1.  OK  any  randoen  stimulus  /  microbalanced  under  a 
giveftOonsformaiiooT  ifrr(/)isnucr:^ba2anced. 

In  cocoecdon  with  pmntwise  transfonnadons.  we  have 
the  following  two  results  (27): 

4.1.2.  Le:  I  be  a  random  siimuJus  such  thai,  for  any 
Cr.y.r),(x'.y'.<')e  /[x.y.rj  end  /(x'./.i'I  ha\e  a 
conlinuous  jmrJ  density.  Then  the  following  conditions  are 
cqidvaient: 

L  J  is  microbalanced  under  all  pouUHue 
transformations. 

2.  The  Joint  density  f  p//(x,y,f)  HfrA/lc',/,t'l  end 
f  Ae  Joint  density  g  o//  (x ,  y ,  r' )  hi  r A  /  (s' ,  / ,  /  j  satisfy 

f(P*tl)+fi<l»P)  =  S(Prq)  +  g(g,p) 
foraJlp,q  e  R. 

4.13.  (Cwollary)  For  any  random  stimulus  /.  if  the  joint 
density  of  l{x,y,t\  >wrA/(x',/,r')  is  identical  either  to  the 
Joint  density  of  \  with  /lx',/,f)  or  to  ike  joint 

density  of  fix',/,/)  wirA  /(x.y.r')  for  every 
(x,y,i).(x'*y'.06  Z,  lAcrt  I  is  microbalMced  under  all 
pointwise  tran^ormaiions, 

4.2.  Texture  quilts. 

The  results  of  section  4,1  can  easily  be  applied  to 
constnict  a  wide  vancty  of  sumuh  for  which  the  first  effecuve 
stage  of  processing  for  motion  involves  a  non*pointwise 
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ut»ld  re\eal  (be  modon  of  /  10  standard  modoa  analyds. 

All  (be  examples  of  this  section  explmt  the  same  essendal 
trick:  bnefly  dispbyed  patches  of  stadc.  nndom-phased 
texture  ocecr  in  specific  spaiioiempcral  reUiioas  to  each  other, 
and  appre^nate  measures  are  taken  to  ensure  that  the  resuldng 
stimulus  is  microbalanced  under  all  poinpAise  trartsfcrxxiioni. 
We  call  such  stimuli  tcaure  ^uilis.  The  texture  quilts 
constructed  in  our  examples  (exemplars  are  sbo^o  in  Figs.  fib. 
7d.  8b  and  Sc)  all  dispby  dect$ts*e  apparent  motion  from  left  to 
right,  \*ben  siev^ed  citba  moaocul^ly  or  blnocularly  from  a 
distance  such  that  they  span  about  4  horizontal  retinal  degrees, 
uith  frames  displ3>-ed  at  15  Hz. 

binary  texture  quilts. 

The  cariest  consiruaions  of  quilts  that  arc  mlcrobolanced 
under  a!!  purel y  temporal  transfonnations  use  stimuli  that  have 
only  two  contrast  values.  We  show  bow  to  construct  a  generic 
binatyovalued  quilt  and  prowde  some  specific  examples. 


ibci\  ad  ania  basset,  asssss^Ocvexyabceexsc^c:, 
wirb  a  bdsg  mapped  issa  Tbeau  ccsssrsei  dbe 

sttna^ss' 

B  =  1+^/2*—*^  fu¬ 

ll  is  easily  derived  foxa  cocuQxry  4.13  that  B-  is 

that  B  is  xrirmbgbnoed  seder  aH  psedy  laspocd 
caasfaemarioes  is  in  127L 

433.  StimuSas:  Tlie  wSmleppio^  nadtmiy  contrast^ 
rmrsii^  vertical  edge.  Hgxc-fib  di^ys  frames 
cocapririag'a  panSechriy  stnyin  fccaay  lexiyc  ^alt  Noxthm 
the  drmrffrina  of  Hg.  fib  con^facs  rime  and  veszad 

•space.  r^ccsesiarioo  in  Hg.  fib  is  presisdy  eqtavaleat  to 
a  ssip  cf  movie  film  with  frames  arranged  v'crri^ly  abovx 
cadi  other,  separated  by  grey  lines.  Between  soecessive  grey 
lines  is  dbplayed  the  actual  twtHfimenssooal  luminance 
fone^  di^>'ed  to  subjects.  ^  Hg.  6a  shows  the  functions  / 1 
/9  ^  ^  /}  assigns  the  value -*!  to 

all  pants  (r.y.r)  uidiin  the  spariocccTocal  block  of  the  first 
frame,  and  0  to  all  odix  pwits.  assigns  the  value  1  to  the 
poiflis  ia  the  leftmost  oghth  of  the  second  frame,  the  v'alue  -1 
to  the  posts  in  the  right  sev*en  eighdis.  asJ  0  to  all  points 
outride  the  second  frame.  The  funcrious  eoloing  successive 
frames  shift  the  bright/dark  edge  rightward  through  the  frame 
until  in  frame  9.  the  field  is  uniformly  bright'  Mu!upl>-isg  each 
frame  1  s  1.2.....  9  by  its  associaied  random  variable  fi.-  >'iekls. 
in  tlus  particular  realizatkrfi.  the  stimulus  given  ia  Fig.  (b. 

frame  frame  __ 


43.1.  A  general  lechnique  fur  constructing  binary  texture 
quilts' that  are  microbalanccd  under  all  purely  temporal 
iransformallons.  Lei  a  c  be  a  set  of  points  in  space  (those 
which  will  take  nonzero  values  at  some  rime  during  the 
display).  For  the  number  N  of  frames  comprising  the  quilt, 
associate  with  frames  1  through  N  a  family 

9// 

of  jointly  independent  random  variables,  each  of  which  takes 
the  value  1  or  >1  with  equal  probability.  In  addition,  associate 
with  frames  1  through  N,  a  fanuly 

/„  i=I,2.-.W 

of  functions,  with  /,■  assigning  0  throughout  all  frames  except 


Figure  fi:  Edge^driven  motion  from  an  ordinary  td%f  and 
pom  a  binary  icxture  ijuilt.  (a)  A  rightward  inoving  light/dark 
edge  visible  to  first-  and  second-order  motion  detectors  Nine 
frames  are  shown;  each  frame  shows  exactly  what  is 
diqilaj'cd.  an  area  of  contrast  -f  1  and  area  of  contrast  *'1.  (b) 
A  realieation  of  the  sidestepping,  randomly  conirast‘rc\ersins 
wucol  edge.  This  random  stimulus  is  microbalanccd  imder 
all  purely  temporal  innsformations:  ihercfore  its  upward 
morion  remrins  inacce  aible  to  siaivfard  morion  analysis  even 
after  an  arbiiraiy,  purely  temporal  uansfomtaiion.  Each  of  the 
frames  1  -  9  of  (b)  was  derived  from  the  co.*rc$pondtng  frame 
of  (a)  by  multiplyi^S  that  entire  frame  by  a  random  vanable 
that  lakes  the  value  1  or-l  with  equal  probability.  The  nine 
frame  random  variables  are  jointly  independent 
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ihims  frame  frzstit 


(a)  (bl)  (b2) 


Figure  7.  OriauaUon'driv<n  second'order  motton  from  a  binary  uxtare  qmlL  (a)  Four  frvncs  of  a  probabilisiicaliy 
delined  sinewave  gnUng  tfui  steps  righm'anl  90  degrees  bem-een  frames.  Tte  rightward  motion  in  (a)  is  accessible  to  all 
motion  deteaors.  (bl)  Four  frames  of  a  static,  vertical  squarewave  grating;  (b2)  Four  frames  of  a  suuc  horuontal 
squarcwavc  gniin^  (c)  A  rightward  translating  texture  pattern.  For  every  white  point  in  (a),  the  cone^ndmg  value  in 
(c)  is  chosen  from  the  vertical  square-wave  grating  in  (bl);  for  every  blaclc  poim.  the  corresponding  value  in  (c)  is  chosen 
from  the  horizontal  square-wave  grating  in  (b2).  Stimulus  (c)  is  not  micrbbalanced,  its  mouon  is  accessible  to  standard 
motion  aiuly^s.  (d)  A  texture  quilL  The  frames  of  (d)  are  derived  by  multiplying  the  corresponding  frames  of  (c)  by 
jointly  independent  rarxlom  variables,  each  of  which  takes  the  value  1  or-~-l  with  equal  probability.  The  texture  quilt 
realized  in  (c)  is  microbalanccd  under  all  purely  temporal  iransformations.  therefore  its  rightward  motion  is  unavailable  to 
standard  motion  analysis,  even  after  an  arUtrary.  purely  umporal  transformation. 


The  motion  displayed  by  this  qu'ilt  is  clearly  driven  by  the 
randomly  contrast-reversing  edge  dui  steps  from  left  to  right 
through  the  course  of  the  display.  Almost  any  bandpass  spatial 
filter  followed  by  a  rectifier  will  suffice  to  expose  this  motion 
to  standard  analysts.  The  following  quilt  requires  a  more 
specifically  tuned  texture-grabbing  spatiad  filler. 


rightward  stepping  rinusoid  of  Fig.  7a  to  sample  between  the 
two  squarewave  gratings  shown  in  Figs.  7bl  and  7b2.  The 
texture  quilt -realized  in  Fig.  7d  is  derived  by  randomly 
reversing  the  contrast  of  each  of  the  frames  of  Fig  7c.  For  the 
realization  given  in  Fig.  7d.  the  random  vanables  ^  ar.d 
^4  used  to  multiply  the  frames  of  Fig.  7c  take  the  values  •I, 
-1, 1.  and  1  respectively. 


4.2.3.  Stimulus:  Oppositely  oriented,  randomly  contrast* 
reverring  squarewaves  selected  by  a  drifting  grating.  In 
Fig.  7d  are  displayed  the  four  frames  comprising  another 
binary  texture  quill  also  constructed  using  technique  4.2.1. 
Figure  7c  shows  the  funcuons  fufi,  /j,  and  used  in  the 
construction.  Each  of  these  frames  was  constructed  by  using 
the  corresponding  frame  of  the  probabilistically  defined. 


Sinusoidal  texture  quilts. 

It  is  simple  to  elaborate  technique  4.2.1  to  a  method  for 
constructing  quilts  involving  textures  of  arbitrarily  many 
contrast  values.  We  illustrate  the  principle  m  the  construction 
of  a  generic  quilt  comprised  of  patches  of  sinusoidal  graung 
and  we  provide  two  specific  examples. 
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A2A.  A  seoenl.lc^mque  for  constniding  sinusoidal 
(exlure  quilcs  mkrobalaoced  iwder  all  purdjr  lempocal 
lransfoniiatioD&  A  generic  sinusoidal  quilt  has  N  firanses. 
Pixels  of  eadt  frame  are  filled  by  choosing  between  a  pair  of 
anusrods  asrigned  to  that  frame.  The  crirical  constraints  (to 
iasox  that  the  renting  sumulus  will  be  nucrpbalanced  os^ 
all  purely  tempocal  cansfoniatioos)  are  tlm  the  different 
rinusoids  thus  patched  logober.  wiJun  a  pven  frame  and 
across  different  frames,  muss  be  <k  e4ual  arr^limdc  a.nd  ba\*e, 
jmmly  independent,  unifo.'mly  dis^buied  random  phases. 

Specifically,  for  /  =  1.2....,Af,  wuh*fV'the  number 
frames  comprising  the  quilt,  let  tV-  be  a  function.  lexrsporally 
omstant  within  frame  i,  assigning  cither  1  or  —1  .to  all  points 
Cir.y.r)  in  the  frame,  and  0  to  all  pdnts  outride  tbe'i** 
frame.  We  use  to  sample  between  static  rinusmdal  gratings 
with  random  phases  and  different  spatial  frequencies. 
Apparent  moiicn  can  often  be  generated  with  such  displays  by 
shifting  cad)  succesrive  sampling  function  in  a  fixed 
direction  relative  to  W^. 

Let 

(Uf.  Oj,  0|.6|.  cd2t  frj*  At*  ^2*  •  •  •  •  ^v< 

be  integers.  For  each  frame  i  of  the  texture  quilt  being 
constructed  wc  shall  use  U'j  to  santplc  between  two  sinusoids. 
C;  andCf,  For  some  integer  P  (independent  of  framel/C;  has 
a  spaual  frequency  of  <a,/P  cycltt  per  hbrironial  pixel  and 
6,/P  cycles  per  vcnical  pixel.  Cj  has  a  spatial  frequency  of 
©,/P  cycles  per  horizonul  pixel  and  6,/P  cycles  per  vcnical 
pixel. 

The  phases  of  all  of  the  rinusoidrpatched  together  in  the ' 
quilt  are  independent  random  variables.  To  be  precise,  let 

Pi»Pik2'  Pi-,Pjv»P.v» 

be  jointly  independent  random  variables,  each  a>$uming  with 
equal  probability  a  value  from  amongst  the  integers 
0,1.,..,P-1.  'nicnforall(x,y,/)€  Z^sci 
s 

S  =  S5/. 

tcl 

where,  for  each/, 

cos  +  e,y  +  p, VP )  if  IVilx ,y , r  1  =  1 , 

Silx.y.r)  =■  cos(2n((b,x+0,y+p,VP) 

0  otherwise. 

Like  the  generic  binary  texture  quilt  P,  5  is 
microbabneed  under  all  purely  temporal  iransformaiions  127J. 

4.2.5.  Stimulus:  Oppositely  oriented,  random^phased 
sinusoids  selected  by  a  drifting  grating.  Tlic  sinusoidal 
analog  to  the  binary  texture  quilt  offig.  7d  is  shown  in  Rg.  8b, 
In  Rg,  8a  arc  shown  the  functions  IVj,  VV2.  and  IV4  used 
to  select  between  horizontal  and  vertical  gratings.  For  this 
quilt,  to,  =  0,  =  0,  for  1  s  1 , 2, 3, 4;  and  for  some  integer  F 
(with  P/P  tlic  number  of  cycles  per  pixel),  w,  =  0  j  =  P . 

The  motion  displayed  by  the  texture  quilt  of  Fig  8b 
evidently  depends  on  the  difference  in  orientation  between  the 
textures  imxcd  in  each  frame.  Of  course,  wc  can  just  as  easily 


keep  orientation  cc^tam  and  vary  spatial  frequency  mstcad. 

4.2.6.  Srimulus:  Randonvpbascd  riiuisoitfr  of  different 
spatial  frequencies  sekefed  by  a  drifting  grating.  Figure  Kc 
show's  a  texture  quilt  using  ^  san^ling  functions  of  Rg.  8a. 
bet  sc:^g©,-=:6,*  ssT&i  s20,-  fori  =  1,2.....  4. 

The  empirical  observations  w-ith  texture  quilts  are  that 
mmioq  can  W  pqceived  wHm  texture  peuches  move  across  the 
fidd,  even  w-hen  the  texture-conv^ed  morion  is  contrived  so 
that  there  are  no  spaiimemporal  conebtions  in  the  stimulus  to 
support  staixiard  morion  analysis  [11.17],  and  when  second- 
ord^  temporal  processing  can  be  aidudoi  [27j.  TTtese 
'  texttire*^veyed  ntorions  are  detected  by  convolving  the  input 
srimulus  with  a  spatial  texture^grabbing  filter  tuned- to  the 
moving  texture,  then  reaifying  the  output  of  the  filter  (to 
indicate  the  presence  or  absovee  of  the  texture),  and  subjecting 
the  rectified  output  to .  standard  morion  analysis.  That 
supraordinate  texture  orientarien  is  earily  perceived  in  thex.y 
reprcscAtaiions  of  the  texture-conveyed  morion  (Rgs.  7d.  Sb 
arid  8c)  indi^tts  that  there  exists  second-order  orientation 
processing  of  textures  in  the  space  domain  analogous  to  the 
second-order  morion  processing  of  textures  in  the  morion 
domain. 

5.  Summary. 

Section  1  introduced  the  disrincuon  between  first-  and 
second-<^er  morion  m^hanisms.  Section  2  reviewed  the 
fundamental  results  concerning  drift-balanced  and 
microbabn^  ra.**  'om  stimuli.  Nlicrobalanced  random  stimuli 
arc  useful  in  the  study  of  second-order  motion  perception 
because  0)  they  are  guaranteed  x>r  to  sysiemaiicolly  stimuJote 
first-order  (Fourier-energy  analytic  or  autocorrelaiional) 
motion  mechanisms,  and  (ii)  it  is  easy  to  produce 
microbabneed  random  stimuli  tliat  display  consistent, 
compelling  apparent  mouon  across  independent  realizauons. 

Section  3  described  miaobalanced  random  stimuli  that 
displayed  different  types  of  apparent  morion.  The  contrast- 
modubting  squarewave  (Srimulus  3.1)  suggests  that  some 
instances  of  microbabneed  morion  may  be  exposed  to  standard 
motion  analysis  by  simple  rectification.  The  contrast-reversing 
squarewave  (stimulus  3.2)  suggests  that  other  instances  of 
microbabneed  mouon  arc  exposed  by  rectifying  the  temporal 
dcnvaiivc  of  the  stimulus.  Moreover,  the  motion  of  stimulus 

3.1  cin  not  be  exposed  by  temporal  differentiation  followed  by 
rectification,  whereas  the  motion  of  srimulus  3.2  can  /lo;  be 
exposed  by  rimple  recrificauon.  A  temporal  filter  with  the 
impulse  response  given  in  Fig.  5  (including  lenns  for  both 
temporal  differentiation  and  temporal  identity),  followed  by 
rectification,  does  suffice  to  expose  the  motion  of  both  stimuli 

3.1  and  3.2  to  siandaru  motion  analysis.  For  each  of  these 
stimuli,  the  optimal  spatial  filter  to  expose  the  motion  is  the 
identity. 

Section  4  introduced  the  notion  o.'  a  random  stimulus 
mtcrobatanccd  under  all  pointwise  tran^ormaiions.  Section 

4.1  provided  necessary  and  sufficient  condiuons  for  a  random 
stimulus  to  be  of  this  soil  Such  stimuli  /  are  significant 
because  poimwise  transformauons  applied  duecily  to  /  merely 
result  again  in  iiucrobalanced  random  functions,  thus  the  first 


Figure  S:  Sinusoidal  uxsure  quiUs — moaon  driven  by  differences  in  orientauon  or  in  spastal  frequency.  The  4  frames  in 
(a)  ix  used  to  select  between  iv^t)  sinusoidal  patterns.  Siunuli  (b)  and  (c)  are  realizations  of  tu-o  such  random  sumuli. 
each  of  uhidi  is  microbalanced  urtder  all  purely  temporal  transformatiofts.  The  sinusoids  mixed  in  (b)  differ  in 
orieniaiion.  v.hile  the  sinusoids  mixed  in  (e)  have  the  same  orientatioft.  but  differ  in  spatial  frequency.  The  phases  of 
sinusoids  are  jointly  irxlependent  across  frames,  and  across  sinusoids  of  different  frequenc)'  mixed  in  the  same  fra.me. 


transfoimauon  in  any  respect  effective  at  exposing  i  *$  motion 
to  analysis  must  be  non-pointwise.  If  the  transformations 
applied  to  the  visual  signal  are  limited  to  cascades  of  (i)  linear 
shifi'invariant  operators  and  0>)  poinmise  operators,  then  the 
first  processing  stage  effective  in  revealing  the  motion  of  / 
must  be  a  nontrivial  linear  tnnsionnauon.  Mor^ver,  since  / 
is  microbalanced.  this  linear  filter  must  be  follow^  by  at  least 
a  pointwise  nonlinearity  for  /'s  motion  to  be  revealed  to 
standard  analysis. 

Section  4.2  illustrated  random  stimuli^rex/ure  quits 
(stimuli  4.2.2. 4.2,3. 4.2.5  and  4.2.6>~*that  yielded  compelling 
texturC'Conveyed  apparent  motion.  These  stimuli  were 
microbalanced  under  all  purely  temporal  transformations. 
Their  motion  cannot  be  ex;>aied  by  simple  rectification,  nor 
indeed  by  any  purely  lerrporcl  ransformations.  no  matter  how 
nonlinear.  The  perception  of  texture  quilt  motion  can  be 
modeled  in  terms  of  a  spaual  tvxture-grabbing  filter  followed 
by  rectification  and  standard  motion  analysis.  Thus,  the 
minimal  system  to  account  for  all  the  demonstrations  of 
second-order  motion  perception  presented  here  w'ould  consist 
of  a  temporal  filter  that  has  both  an  identity  and  a  temporal 
differentiation  component,  a  band-selective  spatial  filter 
followed  by  a  rectifier  and  standard  motion  analysis. 
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To  some  degree,  all  current  models  of  visual  motion-perception  mechanisms  depend  on  the  power  of  the  visual 
signal  in  various  spatlotemporal-frequency  bands.  Here  we  show  how  to  construct  counterexamples,  visual 
stimuli  that  are  consistently  perceived  as  oMousIy  moving  in  a  fixed  duection  yet  for  which  Fourier-domain  power 
analisis  i^elds  no  sittematic  motion  componenta  in  any  given  direction.  We  provide  a  general  theoretical  frame¬ 
work  for  investigating  non-Fourier  motion-perception  mechantsms;  central  are  the  concepts  of  drift-balanced  and 
microbalanced  random  stimuli.  A  random  stimulus  S  is  drift  balaiiced  if  its  expected  power  in  the  frequency 
domain  is  symmetric  with  respect  to  temporal  freqiienc)*,  that  is,  if  the  expected  power  in  S  of  every  drifting 
sinusoidal  component  is  equal  to  the  expeAed  power  of  the  sinusoid  of  the  same  spatial  frequency,  drifting  at  the 
same  rate  in  the  opposite  direction.  Additionally,  5  is  microbalanced  if  the  result  WS  of  windowing  S  by  any  space¬ 
time-separable  function  W  is  drift  balanced.  We  prove  that  0)  any  space-time-separable  random  (or  nonrandom) 
stimulus  b  microbalanced:  (ti)  any  linear  combination  of  pairwise  independent  microbalanced  (respectively,  dnft- 
balanced)  random  stimuli  is  miaobalan^  and  drift  balanced  if  the  expectation  of  each  component  is  uniformly 
xero:  (in)  the  convolution  of  independent  microbalanced  and  drift-lMtanced  random  stimuli  U  microbalanced  and 
drift  balsnced:  (tv)  the  product  of  independent  microbalanced  random  stimuli  is  microbalanced;  and  (v)  the 
expected  response  of  any  Reichardt  detector  to  any  microbalanced  random  stimulus  is  xero  at  every  instant  in  time. 
Examples  are  provided  of  classes  of  microbalanced  random  stimuli  that  display  consistent  and  compelling  motion  in 
one  direction  All  the  results  and  examples  from  the  domain  of  motion  perception  are  transposable  to  the  space- 
domain  problem  of  detecting  orientation  in  a  texture  pattern. 


1.  INTRODUCTION 

Central  to  the  study  of  human  visual  motion  perception  is 
the  relationship  between  perceived  motion  and  the  Fourier 
transform  of  the  spatiotemporal  visual  stimulus  Points  in 
the  domain  of  the  spatiotemporal  Fourier  transform  corre¬ 
spond  to  drifting  sinusoidal  gratings.  For  a  wide  range  of 
spatial  and  temporal  frequencies,  such  drifting  sinusoids  are 
perceived  to  move  uniformly  across  the  visual  field,  and 
their  apparent  speed  and  direction  are  direct  functions  of 
spatiotemporal  frequency.  For  the  most  part,  the  motion 
displaced  by  simple  linear  combinations  of  such  gratings 
reflects  quite  reasonably  the  individual  contributions  of  the 
components,*-* 

Indeed,  current  models  of  human  motion  perception  im¬ 
plicitly  or  explicitly  involve  some  degree  of  Fourier  decom¬ 
position  (bandpass  filtering)  of  the  image  stream.*'^  Gener¬ 
ali),  of  course,  the  decomposition  is  localized  to  finite  tem¬ 
poral  intervals  and  subregions  of  the  visual  Held. 

It  has  long  been  realized,  however,  that  certain  sorts  of 
apparent  motion  cannot  be  understood  directly  in  terms  of 
their  power  spectra.*'*^  For  instance,  much  attention  has 
been  focused  on  sums  of  drifting  gratings  of  slightly  differ 
ent,  high  spatial  frequencies.*^**  In  general,  the  perceived 
velocity  of  such  stimuli  is  determined  not  directly  by  the 
frequencies  of  the  summed  components  but  by  the  pattern 
of  l^ats  at  their  difference  frequency. 

Sperling'^demonstrated  “movement without  correlation” 
m  a  different  stimulus  whose  Fourier  transform,  when  com- 
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puted  globally  or  locally,  contained  no  consistent  moving 
components  and  yet  was  perceived  to  move  decisively  in  a 
fixed  direction.  Subsequently,  Petersik  et  alM  studied  sim¬ 
ilar  displays  in  an  effort  to  clarify  the  relationship  between 
stage  1  (autocorrelational,  Fourier)  mechanisms  and  the 
higher-order  stage  2  mechanisms  mediating  the  perception 
of  what  wc  call**  non-Fourier  motion. 

The  purpose  of  this  paper  is  to  provide  (i)  a  general  theo¬ 
retical  basis  and  (li)  an  array  of  specific  tools  for  studying 
non-Fourier  motion-perception  mechanisms.** 


2.  ANALYZING  A  STIMULUS:  INTUITIVE 
FOURIER  DECOMPOSITION 

We  begin  with  a  brief,  informal  discussion  to  show  how 
particular  motion  stimuli  can  be  analyzed  into  drifting  sinus¬ 
oids  For  illustration  we  use  one-dimensional  spatiotempo¬ 
ral  stimuli  that  move  either  to  the  left  or  to  the  right  and 
whose  luminance  varies  in  only  the  horizontal  dimension, 
although  all  the  results  that  we  derive  apply  in  all  cases  to 
stimuli  of  two  spatial  dimensions  and  time.  A  one-dimen¬ 
sional,  horizontally  moving  stimulus  is  represented  conve¬ 
niently  by  a  two-dimensional  function  /(x,  t),  where  x  (the 
horizontal  axis}  indicates  the  spatial  pattern  of  luminance 
and  t  (the  vertical  axis,  with  time  increasing  upward)  indi¬ 
cates  the  temporal  luminance  pattern.  In  this  representa¬ 
tion,  usually  it  is  immediately  obvious  which  way  the  domi¬ 
nant  Fourier  components  of  I  tend  to  slope  (up  and  to  the  left 
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Fig  1.  SpatioUnporsl Fourier ana]>’sis oft nghtw&rd  steppmgbar  Theabscttsarepresentshonzonttlspace.theordinaterepresentstime. 
«,  One  frame  of  a  movie  of  a  rightward  stepping  vertical  Ur.  b.  Horizontal-temporal  cross  section  of  a  nghtward-steppmg  vertical  bar.  c. 
Approximation  to  the  rightward-stepping  Ur  obtained  b>  taking  an  equally  weighted  sum  of  icos(2Tn(z/X  *  tlT))\n  •  1, 21.  d.  Approiima 
tion  to  the  rightward-stepping  Ur  obtained  by  taking  an  equally  weighted  suraof  |co$(2vn(z/X  r/7))|n  ■  1, 2 . 12}. 


or  up  and  to  the  right).  For  example,  Fig.  la  represents  a 
single  rrame  of  a  white  vertical  bar,  extended  up  and  down 
through  the  field  of  vision.  Figure  lb  shows  the  space-time 
representation  of  the  bar  in  Fig  la,  which  appears  at  the  left 
at  time  tero  and  moves  at  a  constant  rate  to  the  right  during 
the  time  course  of  the  display. 

For  the  moment,  we  shall  generalize  broadly,  using  the 
word  sum  to  describe  both  finite  and  countable  summations 
as  well  as  integrations  over  bounded  and  unbounded  real 
intervals.  In  this  case,  we  can  do  approximate  justice  to 
some  basic  facts  about  visual  stimuli  and  their  Fourier  trans¬ 
forms  without  getting  bogged  in  technicalities  Any  spatio- 
temporal  stimulus  I  can  be  decomposed  into  a  weighted  sum 
of  appropriately  phase-shifted,  drifting  sinusoidal  gratings. 
Moreover,  this  sum  is  unique;  that  is,  there  is  only  one 
assignment  of  weights  and  phases  to  drifting  gratings  that 
recaptures  I  in  the  corresponding  sum. 

Indeed,  the  Fourier  transform  of  /  is  often  defined  to  be 
the  function  that  makes  this  assignment.  There  are,  howev¬ 
er,  various  other  commonly  encountered  equivalent  defini¬ 


tions  of  the  Fourier  transform  (one  of  which  we  shall  shortly 
adopt)  that  may  be  more  convenient  for  certain  purposes. 

Example:  Fourier  Components  of  a  Rightward-Stepping 
Vertical  White  Bar 

Most  of  the  action  of  the  moving  bar  stimulus  I  defined  by 
Figs,  la  and  lb  takes  place  along  the  line  L  *  {(x,  t)lx  ■  f}  in 
Fig.  lb,  that  is,  the  points  at  which  I  deviates  most  from  its 
mean  value  are  along  this  line.  For  our  purposes,  the  most 
useful  indicator  of  where  the  action  is  in  a  given  stimulus  /  is 
the  squared  deviation  of  /  from  its  overall  mean  value  at  each 
point  in  itsdomair.  As  is  clear,  I  deviates  most  energetically 
from  its  mean  along  the  line  L. 

What  spatiotemporai  sinusoidal  gratings  are  weighted 
most  heavily  in  the  Fourier  sum  yielding  /?  A  good  way  to 
answer  this  question  is  to  ask  another  What  gratings  can  be 
shifted  in  phase  so  as  to  match  I  most  closely?  Those  si¬ 
nusoids  that  can  be  shifted  so  as  to  have  high  values  where  / 
has  high  values  and  low  values  where  /  has  low  values  are  the 
ones  that  will  figure  most  heavily  in  the  weighted  sum  com- 
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Fig  2.  SpBtiotemporal  Fourier  analyiU  of  atiaulua  h,  a  rightward-atepping,  contraatrevereing  vertical  bar.  a,  Horirontal-temporal  crou 
aection  of /i.  b,  Horizontal^temporal  croaa  aection  of  a  verti^,  leftward*driftingrinuaoid,  which  correlates  well  with  h:  coa(2T(2x/X  -f  2t/D 
-  r/2).  c,  Horiaontal^temporal  croaa  aection  of  a  more  alowly  leftward  •drifting  amuaoid,  which  alao  correlates  well  with  h:  coa(2v(3x/i?fr  -f  t/ 
Ti-rn). 


posing  I  In  short,  those  gratings  that  can  be  phase  shifted 
so  as  to  correlate  best  with  /  will  liave  the  highest  amplitudes 
(weights)  in  the  sum. 

The  sinusoidal  gratings  that  correlate  best  with  l(x,  t)  of 
Fig.  lb  are  those  that  assume  the  value  1  along  the  line  L. 
that  is,  all  the  sinusoids  in  the  set 

0  ■  {co8(ax  -  of)la  €  1R|. 

Figures  Ic  and  Id  illustrate  how  i  is  approximated  more  and 
more  closely  by  taking  sums  involving  more  and  more 
(equally  weighted)  elements  of  0. 

Example  1:  Rlghtward'Stepplng,  Contra$t*Keversing 
Vertical  Bar 

Contrast'reversing  stimuli  ere  critical  for  understanding  the 
implications  of  Fourier  analysis.  Note  first  that,  as  in  the 
case  of  I  defined  in  Fig.  I,  most  of  the  power  of  h  in  Fig.  2  is 
centered  along  the  line  1.  However,  the  elements  of  fl  con¬ 
tribute  no  power  to  h.  To  see  this,  note  that  the  value  of  h 
flipdops  around  the  mean  luminance  along  L,  while  the 
value  of  any  element  C  €  fl  remains  constant;  thus  the  value 
of  the  product  of  h  with  C  will  flipflop  (with  h)  around  the 
mean  luminance  over  the  points  of  L  and  will  be  zero  every¬ 
where  else.  Consequently,  the  sum  taken  over  all  points  (x, 
f )  of  the  product  h(x,  t}C(x,  f )  is  zero.  This  is  equivalent  to 
saying  that  the  correlation  of  h  with  C  is  zero. 

On  the  other  hand,  the  function 

C(x,  ()  co$(ax  +  ^(  +  p) 

correlates  positively  with  h  when  a  and  ^  are  chosen  so  that 
the  crests  and  troughs  of  C  slope  across  L  and  oscillate  at  an 
appropriate  frequency,  p  can  then  be  chosen  to  lay  the 
crests  of  C  across  the  bright  regions  of  k  and  the  troughs 
across  the  dark  regions.  Examples  of  sinusoids  that  corre* 
late  well  with  h  are  given  in  Figs.  2b  (cos(3x  + 1  ~  »/2))  and 
2cIcos(2x  +  2(-t/2)1. 

Direction  of  Drift  in  Sinusoidal  Gratings 

For  each  nonnegative  real  number  a,  cos(ax  -  at)  drifts 

from  left  to  right.  By  contrast,  cos(ftx  +  at)  drifts  at  the 


same  rate  from  right  to  left.  For  any  w,  r,  p  6  IR,  if  d;  *■  0,  th? 
grating 

C(x,  t)  *  cos(ux  +  Tt  +  p) 

has  constant  value  over  space  but  oscillates  in  time  with 
frequency  r.  Otherwise  (if  U9^0)C  drifts  with  speed  Ir/v); 
it  drifts  rightward  if  r/o  S  0  and  leftward  if  t/q  >  0.  Ac¬ 
cordingly,  we  call  C  rightward  drifting  if  t/«  <  0,  leftward 
drifting  if  t/«  >  0,  and  stationary  if  r  “  0. 

3,  THE  MOTION-FROM-FOURIER- 
COMPONENTS  PRINCIPLE 

For  any  real-valued  function,  f,  the  sum  (taken  over  all 
points  in  the  domain  of  /)  of  the  squared  values  of  /  is  called 
the  power  in  /.  Parseval's  relation  states  that  the  power  in  f 
IS  proportional  to  the  sum  of  the  squared  amplitudes  of  the 
sinusoids  into  which /can  be  (uniquely)  decomposed. 

Thus,  in  particular,  we  can  tally  up  the  power  in  a  dynamic 
visual  stimulus  either  point  by  point  in  space-time  or  drift¬ 
ing  sinusoid  by  drifting  sinusoid.  Of  course,  considering  the 
unambiguous,  uniform  apparent  motion  displayed  by  drift¬ 
ing  sinusoidal  gratings,  it  would  seem  to  make  more  sense  for 
a  motion-perception  system  to  do  its  power  accounting 
across  the  sinusoids  composing  the  stimulus. 

These  considerations  lead  naturally  to  a  commonly  en¬ 
countered  general  rule  for  predicting  the  apparent  motion  of 
an  arbitrary  horizontal  stimulus  i(x,  t):  For  I  considered  as  a 
linear  combination  of  sinusoidal  gratings,  compare  the  pow¬ 
er  in  f  of  the  rightward-drifting  gratings  with  the  power  of 
the  leftward-drifting  gratings,  if  most  of  I’s  power  is  contrib¬ 
uted  by  rightward-drifting  gratings,  then  perceived  motion 
should  be  to  the  right.  If  most  of  the  power  resides  in  the 
leftward-drifting  gratings,  perceived  motion  should  be  to  the 
left.  Otherwise  I  should  manifest  no  decisive  motion  in 
either  direction. 

This  prediction  rule  for  horizontally  moving  stimuli  is  a 
restricted  version  of  the  motion-from-Fourier-components 
(MFFC)  principle:  More  generally,  let  L  be  any  visual  stim¬ 
ulus;  that  is,  L'.X  X  Y  X  T  -*  R,  for  bounded  real  intervals 
X,  y,  and  T,  where  for  any  (x,>',t)e  XX  YX  T,  L(x,y,Ois 
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construed  as  the  luminance  of  a  point  (x,  >•)  in  a  visual  Held 
at  time  t.  A  more  general  version  of  the  MFFC  principle  is 
as  follows:  For  Z/ to  exhibit  motion  in'a  certain  direction  in 
the  neighborhood  of  somt  point  (x,  y,t)€  IR^,  there  must  be . 
some  spatiotempor^  volume  A  in  some  sei^  proximal  to  (x, 
y,  t)  such  that  the  Fourier  transfom  of  L'  computed  lo^y 
across  A  has  substantial  power  over  some  repons  of  the 
frequency  domain  whose  points'  contsjxmd;  in  the  space- 
time  domain,  to  sinusoidal  gratings  whoM  direction  of  dnft 
is  consonant  mth  th'e'motiori  perceived.' 

That  any  standard  version  of  the  MFFC  principle  can'hot' 
account  for  all  phenomeha'associated'mth  human  motion 
perception  was  demonstrated  by  Sperling, who  desaibed^ 
the  following  three<fla$h  stimulus. .  Frame  0  is  a  rectangular 
block  of  contiguous  small  squares,  each  of  which  is  ihdepen* 
dently  painted  black  or  white  with  equal  probability.  ^In- 
frame  I,  a  subblock  Bi  of  frame  0  is  scrambled  (that  is,  in 
frame  1,  each  component  square  vdthin  Bi  is  independently 
repainted  black  or  white  with  equal  probability).  In  frame  2 
a  different  subblock,  B2,  is  scrambled.  For  many  sizes  of 
rectangles  and  frame  presentation  rates,  such  a  stimulus 
elicits  apparent  motion  in  the  direction  from  Bi  to  B2;  none> 
theless,  it  is  unlikely  to  correlate  signiHcantl^'  with  any  given 
spatiotemporal  sinusoidal  grating. 

It  1$  our  purpose  here  to  build  on  these  observations.  We 
shall  first  give  precise  formulation  to  the  notion  of  a  random 
stimulus  and  then  define  a  certain  class  of  random  stimuli 
(the  class  of  dnftibalanced  random  stimuli)  that  is  useful  in 
studying  visual  perception  (since  any  motion  displayed  by  a 
dnft'balanced  random  stimulus  cannot  be  explained  in 
terms  of  the  MFFC  principle).  We  proceed  to  show  that  the 
(spatiotemporal)  convolution  of  two  drift*batanced  random 
stimuli  is  drift  balanced  and  mention  some  of  the  psycho* 
physical  implications  of  this  fact.  In  proposition  3  below  we 
prove  that  linear  combinations  of  certain  driftibalanced  ran* 
dom  stimuli  are  themselves  drift  balanced  (this  resuU,'which 
is  illustrated  with  a  variety  of  stimulus  examples,  is  particu* 
larly  useful  in  constructing  drift*balanced  random  stimuli 
that  display  consistent  apparent  motion  across  independent 
realizations).  In  Section  7  we  provide  an  alternative  charac* 
terization  of  the  class  of  drift*balanced  random  stimuli  in 
terms  of  simple  point*delay  Reichardt  detectors  (or  autocor¬ 
relation  coefficients)  and  apply  this  characterization  to  dis* 
tinguish  the  subclass  of  drift-balanced  random  stimuli  that 
we  call  microbalanced.  A  random  stimulus  /  is  rolcrobal* 
anced  if,  for  any  space-time-soparable  function  W,  the  re* 
suit  WI  of  windowing  /  by  W  is  drift  balanced.  We  derive  a 
collection  of  basic  results  about  microbalanced  random 
stimuli  and  show  that,  in  fact,  all  thr  demonstration  stimuli 
previously  defined  (demonstrations  1-5  below)  are  microba- 
lanced.  Among  other  things,  we  prove  that  the  expected 
response  of  any  e!abo*ated  Reichardt  detector^-^  to  any  mi¬ 
crobalanced  random  stimulus  is  zero  at  any  instant  in  time. 
Finally,  we  observe  some  salient  psychophysical  properties 
of  microbalanced  random  stimuli  and  discuss  some  of  the 
possible  explanations  of  the  non-Fourier  motion  elicited  by 
such  stimuli 

4.  PREUMINARIES 

In  this  paper  we  deal  with  propertiea  of  random  stimuli 
Roughly  ipeaking,  a  random  stimulus  is  ajointly  distributed 
family  of  random  variables  assigned  to  a  grid  of  locations 
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covering  the  visual  iield  across  time.  In  thb  section  we 
collect  the  tools  appropriate  for  deid^  with  such  objects. 
Ihis  section  is  split  into  two  subse^ons,  one  devot^  to 
Mintinuous  random  variables,  in  whi^  we  intioduce  explic- 
'itiyvsome  notation  for  handling  inflation  ^"d  deflne  a 
density  and  one'devoted  to  dis^ete  dynamic  visual  stimuli 
and  their  Fourier  transforzhs,  in  which  we  identify  a  stimulus 
■  (an  alignment  of  luminance  (nonhegative,'  real  values)  to  a 
legi^  grid  of  points  throughout  visual  space  and  time]  with 
its  ^ntrast  m^ulation  funciion  (the  normidized  deviation 
of  luminiuiM  from  its  mean)  an’d  introduce  frequency-do¬ 
main  notation. 

Continuous  Random  V^ables 
Cur  stizDuli  are  real-valued,  randomly  varying  functions  of  a 
dismte  domain.  The  luminances  assigned  to  points  (pix* 
eU)  are,  in  general,  jointly  distributed  random  variables. 
The  basic  definitions  and  proofs  that  we  present  here  pre¬ 
suppose  that  these  random  variables  are  real  valued  and 
continuous.  ■  (In  general,  the  discrete-case  analogs  are  sim¬ 
pler  and  should  be  obvious.) 

Let  Z  iZ*)  denote  the  set  of  integers  (positive  integers), 
and  let  IR  (]R^)  denote  the  real  (positive  real)  numbers. 

The  following  conventions  are  useful.  As  usual,  call  any 
subset  a  c  IR  an  interval  if  and  only  if  (ifO,  for  any  x,  z  e  a 
and  any  y  €  IR,  if  x  S  y  ^  z,  then  y  6  o;  more  generally,  for 
any  k  €  Z*.  call  any  subset  ^  c  K*  an  interval  of  IR*  iff  ^  is 
the  Cartesian  product  of  (possibly  unbounded)  real  intervals 
•  • .  4*-i.  In  this  case,  for  any  function /:1R*  -*  R,  it 
is  convenient  to  indicate  the  integral  of  /  over  $,  if  it  exists,  as 

Moreover,  we  call  any  nonnegative,  real-valued  function  f  of 
IR*  a  density  iff  /  is  intefrable  over  IR*  and 


Discrete  Dynamic  Visual  Stimuli  and  Their  Fourier 
Transforms 

Controst  ModuJotion 

Luminance  is  physically  constrained  to  be  a  nonnegative 
quantity.  Psychophysically,  however,  the  significant  quan¬ 
tity  is  contrast,  the  normalized  deviation  at  each  time  t  of 
luminance  at  each  point  (z,y)  in  the  visual  field  from  a  base 
level,  or  level  of  adaptation,  which  reflects  the  average  lumi¬ 
nance  over  points  proximal  to  (x,y,  t)  in  space  and  time.  We 
shall  restrict  oui  attention  throughout  this  paper  to  stimuli 
for  which  it  may  be  assumed  that  the  base  luminance  level  n 
is  uniform  over  the  significant  spatiotemporal  locations  in 
the  display.  In  practice,  this  condition  is  met  if  (i)  subjects 
are  adapted  sufficiently  to  a  field  of  uniform  luminance  /i 
before  the  onset  of  non-^  luminances  and  (ii)  the  duration 
over  which  non-M  luminances  are  displayed  is  sufficiently 
brief. 

For  any  stimulus  L  with  base  luminance  ft,  call  the  func¬ 
tion  /  satisfying 

L-Ml  +  i)  (1) 

the  confrast  modulator  of  L  (and  note  that  1  ^  -1). 

Psychophysically,  it  is  well  known  that,  over  substantial 
ranges  of  the  apparent  motion  of  L  does  not  depend  on  m 


1990  J.  Opt  Am  AA^oL  5,  No.  ll/November  1988< 

Thus  the  contrast  modulator,  /  of  L  emerges  ba  a  likely 
function  to  anal^e  for  the  motion  information  carried  by  L. 
Accordingly,  we  shall^shift  our  fcKus'from'  luminwMto  con* 
trast  and  identify  L  with  its  contr^t  modulator,  dropping 
reference  to  adaptation  level. 

Specifi^ly,  we  shdl  call  any  function  13?  IR  a  Himu- 
lus  iff  /{x,‘y,  tj «  0  for  ^  but  finitely  many  points  (x,'y,  t)  e 
Z». 

Strictly  speaking,  we  should  also  require  that  I  never  drop 
below  -1.  This  res^iction^  however,  would  lud  to  uhheces* 
sary  compli^tions  in  dealing  wito  various'sorts  of  cbmbina* 
tiom  of  stim^i.  In  all  ^es^the  points  ^at  we  wish  to  make 
tolerate  resi^ng  of  stimuli  by  (ubitrary  multipliMtive  cbri* 
stants  to  settle  their  minimal  values  to'some  perceptually  - 
appropriate  level  between  ^d  0.-  AcM^ingly,  we  drop 
the  restriction  that  /  i  rl. 

In  general,  we  shall  consider  stimuli  of  two  spatial  dimeh* 
sions  and  time.  The  reader  may  find  it  convenient  to  think 
of  the  first  spatial  dimension  (which  we  shall  always  index  by 
x)  as  horizontal,  with  indict  increasing' to  the  right,  and  the 
second  spatial  dimension  (always  indexed  by  y)  as  vertical, 
with  indices  increasing  upward.  The  temporal  dimension  is 
always  indexed  by  (. 

Fromes  ond  Frame  B/ocks 

For  any  stimulus  /,  we  cal!  the  restriction  of  /  to  2*  X  {!}  the 
(th  frame  of  I.  In  all  the  stimulus  examples  that  we  shall 
consider,  frames  clump  into  blocks:  specifically,  for  each 
demonstration  stimulus  I  defined  in  this  paper,  there  are 
integers  k  and  N  such  that  all  changes  in  luminance  occur  in 

frames  kn,  where  n  ■  0, 1 . N,  and  otherwise  luminance 

remains  constant  across  frames.  The  group  of  identical 
frames  between  and  including  frames  kn  and  kn  4-  k  *-  1  ws 
shall  call  the  nth  frame  6fock  of  I. 

Any  stimulus  I  is  nonzero  at  only  a  finite  number  of  points 
in  its  countably  infinite  domain.  Consequently,  (i)  the 
mean  value  of  /  is  0,  and  (ii)  the  povser  in  /  is  finite. 

From  property  (ii)  we  observe  (hat  /  has  a  welhdeflned 
Fourier  transform,  which  we  denote  by  /.  Specifically, 

/(«,  t)  ■  ^  7|x,  y,  (]exp(-;(wx  +  ^y  +  rt)) 

(analysis). 

We  shall  always  use  square  brackets  around  the  argu¬ 
ments  of  discrete  functions  and  parentheses  around  the  ar¬ 
guments  of  continuous  functions.  Although  /  is  defined  for 
all(w,9,r)6  !R^^ti$periodicover2r^neachvarlable.  This 
fact  is  reflected  in  the  inverse  transform: 
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5.  DRIFT-BALANCED  RANDOM  SflMUU 

We  b^in  by  generalizing  the  notion  of  a  stimulus  to  that  of  a 
random  stimulus  Whereas  a  nonrandom  stimulus  assigns 
fixed  values  to  ZVe  random  stimulus  I  a^i^  jointly  dis¬ 
tributed  random  variables  that  deviate' from  s^ro  at  only  a 
finite  number  of  points. 

Various  expectotlons  uaociated  with  I  are'  defined  easily. 
We  shall  be  particularly  interested  in  the'  expected  power 
of  T  at  some  point,  (tf,;9,.r)  in.tbe  frequen^. domain: 

$,  r)t^.  This  reflects  the  expected  power  in  /  of  a 
sinusoid  C  that  modulates  contrast  at  the  rate  of  c?/2r  cycles 
per  eoluinn,  6f2x  cycles  per  row,  and  r/2r  cycles  per  frame. 
The  sinusoid  with  the  same  spatial  frequen^  as  C  and  mov¬ 
ing  at  the  swe  rate  but  in  the  opposite  direction  is  obtained 
simply  by  reversing  the  diredioh  of  C’s  temporal  contrast 
modulation:,  that  is,  by  modulating  contrast  *-r/2r,  cycles 
per  frame.  When  the  expected  power  in  /  of  any  given 
drifting  sinusoid  is  matched  by  the  expected  power  of  the 
sinusoid  of  the  same  spatial  frequency  drifting  at  tha  same 
rate  in  the  opposite  direction,  we  call  I  drift  balanced. 

Although  the  MFFC  principle  suggests  that  drift-bal¬ 
anced  random  stimuli  should  not  display  consistent  appar¬ 
ent  motion  across  independent  realizations,  we  shall  provide 
examples  of  drift-balanced  random  stimuli  (in  Section  6) 
that  do  in  fact  display  strong,  consistent  motion  across  trials. 

Beyond  these  basic  developments,  two  propositions  are 
proved  in  this  section.  In  proposition  I  we  demonstrate  that 
any  random  stimulus  separable  in  space  and  time  (see  defi- 
nition  3  below)  is  drift  balanced,  and  in  proposition  2  we 
'ihow  that  the  (spatiotemporal)  convolution  of  any  two  Inde¬ 
pendent,  drift-balanced  random  stimuli  is  drift  balanced. 
We  now  proceed  more  precisely  as  follows. 

Deflnltion  1:  Random  Stimulus 

C^l  any  family  /(x,  y,  tj,  (x,  y,  ()  e  Z\  of  random  variables 
jointly  distributed  with  density  /,  a  random  stimulus  when 

ti)  /|x,y,  ()  ■  0  for  all  but  a  finite  subset  a  c  Z*  and 
(ii)  ^/[x,y,  tJ*J  exists  for  all  (x,  y,  t)  e  Z\ 

Expectations  delated  to  / 

With  h  the  cardinality  of  o,  we  set  up  a  one-to-one  corre¬ 
spondence  betw  een  dimensions  of  E*  and  points  of  a  so  that 
each  coordinate  of  an>’  vector  i  €  IR*  corresponds  to  one  of 
the  points  of  a.  We  can  now  treat  t  as  a  stimulus  (whose 
nonzero  values  are  restricted  to  the  points  of  a).  In  particu¬ 
lar,  letting  denote  the  coordinate  of  i  corresponding  to 
a  given  (p,  q,  r)  €  a,  we  set 


(ix/>  if(x,y,t)e  a 

0  otherwise 


X  exp0*f«x  -f  ^y  -I-  rOldaxl^dr  (synthesis). 

In  the  Fourier  domain  we  shall  consistently  use  w  to  index 
frequencies  relative  to  x,  ^  to  index  frequencies  relative  toy, 
and  r  to  index  frequencies  relative  to  t.  This  convention  is 
exemplified  by  the  definition  of  7  above. 

We  distinguish  the  stimulus  0  by  setting  Olx,  y,  ()  ■  0  for 
all  X,  y,  ( €  Z.  In  parallel,  we  let  i)  assign  U  lo  all  (u>,  $,  r)  e 


for  any  (x,  y,  I)  e  Z\  We  can  now  conveniently  formulate 
various  expectations  associated  with  7;  in  particular,  we  de¬ 
fine  the  expectation  of  7  by 

f  i{x.y, «l/(0di 

Jr* 

for  all  (x,y,  0  €  Z\  (Note  that  Bi  is  a  nonrandom  stimulus.) 
Consider  the  Fourier  transform  of  Bp 
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r  i(rt,y.<l/(Odi 

X.eiiKTjXiM:  +  dy+ 1/))  - 

_  rice, «,  T)/Wdi  -  £[7(,»,  H'r)).  , 

This  Iwds  to  the  foUdmng  obse^ation. 

Observation!" 

The  Fourier  transform  of  the  expectation  of  a  random  atim* 
ulus  '/  is  equal  to  the  expectation  of  the  Fourier  transform  of  ^ 

Noie  especially,  here,  the  implicaUon  that  £/  ^  ®  ■ 

We  call  any  ran'dom  stimulus  I  invariant  iff  there  exists  a 
sUmulus  S  such  that  /  -  S  with  probability  1. 

Example  2:  Randomly  Contrast-Reversiiig,  Rightward- 
Stepping  Vertical  Bar 

Let  the  random  stimulus  /  contain  four  frame  blocks  in¬ 
dexed  0, 1, 2,  and  3,  and  let  each  frame  block  be*  composed  of 
a  horizontal  sequence  of  four  rectangles  indexed  0. 1, 2,  and  3 
from  left  to  right.  Let  ^  4>i,  and  ^3  be  pairwise  indepen- 

dent  random  variables,  each  taking  the  value  C  or  -C  with 
equal  probability.  Give  rectangle  f  in  frame  block  i  the 
value  assumed  by  and  give  ail  other  pixels  the  value  0, 

The  restriction  of  /  to  any  one  of  its  rows  is  characterized 
by  F>g.  3,'  as  a  function  of  x  along  the  horizontal  axis  and  t 
along  the  vertical  axis.  As  is  clear,  for  any  (*,  y,  t)  €  2^ 

£l/|x.y.(ll-0; 

that  is.  El "  0,  from  which  we  infer  that  £/  *  0. 

An  interesting  fact  that  may  not  be  so  obvious,  however, 
(this  follows  from  corollary  1  below)  is  that  the  expected 
power  contributed  to  /  by  any  given  drifting  sinusoidal  grat- 
ing  is  equal  to  the  expected  power  contributed  by  the  grating 
of  the  same  spatial  frequency  drifting  at  the  s^e  rate  in  the 
opposite  direction.  This  may  seem  surprising  in  light  of  the 
MFFC  principle,  since  any  realization  of  /  is  marked  by  a 
systematic,  left*to*right  perturbation  acroM  time,  which  (as 
one  might  expect)  tends,  under  appropriate  viemng  condi¬ 
tions,  to  be  perceived  as  motion  fntm  left  to  right.  Indeed, 
as  we  shall  see  in  Section  6,  it  is  quite  easy  to  construct 
random  stimuli  with  this  property  that  nonetheless  display 
striking,  reliable  apparent  motion  i  1  a  fixed  direction. 

This  fact  motivates  a  notion  cent,  al  to  this  paper:  that  of 
a  drift-balanced  random  stimulus  (see  definition  2  below). 
As  the  name  suggests,  a  drift-balanced  random  stimulus  is 
one  for  which  the  expected  contribution  of  any  given  drifting 
sinusoidal  f'lting  is  balanced  by  (equal  to)  the  expected 
contribution  of  the  corresponding  grating  drifting  at  the 
same  rate  in  the  opposite  dlre^'tion.  Of  course,  just  as  a 
given  random  variable  may  have  little  or  no  probability  of 
assuming  a  value  equal  to  its  expectation,  a  particular  real* 
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Fig.  .  Rightward-stepping,  rudomly  eontrast-reverting  vertical 
ban  ’"a  horizontal-temporal  dia^am  of  the  random  stimulus  /,  a 
vertical  bar  that  appears  with  ^ntrast  C  or  7C  randomly  assigned 
and  steps  its  width  rightward  three  timM  over  a  zero-contrast  visual  ^ 
field^  assuming  contrast  C  or.-C  with  equal  probability  with  each 
etep.'  Ihe  expected  power  in  /  of  any  given  drifting  sinusoid  is  equal , 
to  the  eipectM  power  of  the  sinusoid  of  the  same  spatial  frequency 
dnfting  at  the  same  rate  but  in  the  opp^te  direction. 

ization  of  a  drift-balanc^  random  stimulus,  /,  dofs  not,  in 
genera),  have  perfectly  balanced  domponenta.  However, 
when  gauged  over  a  number  of  independent  realizations,  the 
mean  contribution  of  a  particular  Fourier  component  of  / 
tends  to  balance  gainst  the  contribution  of  the  correspond¬ 
ing,  oppositely  moving  component. 

Definition  2:  Drift-Balanced  Random  Stimulus 

Call  any  random  stimulus  /  drift  balonad  iff,  for  any  u,  9,  r 

£117(«,B,r)l»lw£(17(tf,B.-T)l*].  (2) 

(For  a  proof  that  the  expectations  In  Bq,  (2)  exist,  see  Appen¬ 
dix  A.)  Notice  that,  because  I  is  real  valued,  Eq.  (2)  is 
equivalent  to 

£il/(w,B,T)i»|-£ll/(-tf.-0,r)Fl; 

that  is,  /  is  drift  balanced  iff  the  expected  power  in  /  of  any 
given  drifting  sinusoidal  grating  is  equal  to  the  expected 
power  of  the  grating  with  the  same  spatial  frequency  drifting 
at  the  same  rate  but  in  the  opposite  direction, 

As  we  shall  see  in  Section  6,  the  following  class  of  random 
stimuli  is  useful  in  constructing  drift-balanced  random  stim¬ 
uli  that  display  consistent  motion. 

DeRnltion  3:  Space-Time-Separable  Random  Stimulus 
Call  any  random  stimulus  /  spoce-f  ime  separable  iff,  for  any 
(x,y,t)€Z\ 

for  jointly  distributed  real  random  functions  g  and  h. 

Immediately  we  note  a  simple  proposition. 

Proposition  I 

Any  space-time-separable  random  stimulus  is  drift  bal¬ 
anced.  < 

Proof 

Let  /  be  a  space-time-separable  random  stimulus,  with 
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for  aU  (r,  y,  f )  c  2^  then 

‘  f  !g(«i 

.TKxu,  since  h  is  res]  valued, 

I7(w,  $,  t)1?  »  ||(w,  fi)PI^(Tr)I*'«  17(«,  6,  -t)P. 

Taking  ezpMtations  of  sides  yields  Eq;  (2).  | 

It  would  be  suiprising  for  any  siMce^Umf-sepibable 
dom  stimulus/ to  exhibit  stid^,  consistent  motion  in  a  fixed 
direction/sihce  ihe  b^y  'soH  of  temporal  contrast  change 
induced  by  /  is  a  spatially  global  modulation. 

However,  as  we  have  hinted  in  exi^ple  1,  there  do  exist 
drift'balanced  r^dom  stimuli  that  e^ibit  decisive  motion 
in  a  fixed  direction  not  only  on  the  average  across  a  number 
of  trials  but  on  virtually  each  display.  In  Section  6  we  shall 
provide  some  general  results  that  are  useful  for  constructing 
a  broad  range  of  drift*b^anced'random  stimuli.that  show 
■  strong  motion.  'However;  we  shall  show  first  that  the  spatio* 
temporaJ  convolution  of  independent  drift*balanced  ran* 
dom  stimuli  is  drift  balanced  and  briefly  mention  some  of 
the  ramifications  of  this  fact. 

Proposition  2 

The  (spatiotemporal)  convolution  of  independent,  dnft*bal* 
anced  random  stimuli  is  drift  balwced. 

Proof 

Let  /  and  J  be  independent  drift*balanced  random  stimuli. 
For  any  random  stimuli  we  have 

17771*.  mm 

The  independence  of  /  and  J  implies  that 

Thus,  since  7  and  J  are  drift  balanced,  we  find  that,  for  any  <■>, 
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Most  computational  models  of  the  sensory  transforma* 
tions  mediating  human  perception  routinely  apply  a  spatio* 
temporal,  linear,  shift-invariant  filter  to  the  input  stimulus. 
The  impulse  response  (i.e.,  convolution  kernel)  of  any  such 
filter  can,  of  course,  be  regarded  as  an  invariant  stimulus. 
Typically  the  filters  applied  are  drift  balanced.*”^**’  Obvi¬ 
ously,  filters  that  depend  on  only  spatial  characteristics  of 
the  stimulus  l^ing  processed  are  drift  balanced  (for  in* 
stance,  all  manner  of  oriented,  band-tuned,  <patial  edge 
detectors).  Similarb',  filters  (such  as  flicker  detectors)  that 
depend  on  only  temporal  stimulus  characteristics  are  drift 
balanced.  More  generally,  all  space-time-separable  filters 
are  drift  balanced  (proposition  1).  Thus,  given  a  drift-bal¬ 
anced  random  input  stimulus,  the  output  of  many  of  the 
filters  that  are  commonly  thought  to  function  in  the  early 
stages  of  human  visual  processing  is  also  drift  balanced. 
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«.  CONSICTENT  APPARENT  MOTION  FROM 
DRIFT-BAIANCm  STIMULI 

We  begin  t^  section  by  noting  some  general  results  con¬ 
cerning  linetf  combinations  of  random  stimuli,  leading  up  to 
propoeitibn  3  below,  in  which  we  show  that  any  linear  combi¬ 
nation  of  pairwise  independent,  drift-balanced  random 
ratimuli,  all  of  which  have  expectation  0,  is  drift  balanced. 
(Actoally,  this  is  1an  implication  of  propoeition  3,  which  is 
slightly  more  general.)  -  From  this  finding  follow  corollaries 
1  and  Cl  (Cl  in  Appendix  C),  each  of  which  gives  rise  to 
specific  examples  of  d^t*balanced  random  stimuli  that  elic¬ 
it  consistent  apparent  motion. ,  Several  of  these  examples 
, are. detailed  in  this  SMtion.  &periiDenta]  findings  with 
regard  to  these  example  random  stimuli  are  reported. 

One  may  wonder  whether  linear  combinations  of  indepen¬ 
dent  drift-balanc^  random,  stimuli  are  drift  ^balanced. 
Tliat  this  is  not  the  case  is  evident  from  the  fact  that  any 
invariant  stimulus  whatsoever  can  be  expressed  as  a  linear 
combination  of  shifted  impuhes,  which  are,  of  course,  jointly 
mdependent  and  individually  drift  balanced. 

iUthbugh  linear  ^mbinations  of  arbitrary,  pairwise  inde¬ 
pendent,  drift-balanced  random  stimuli  are  not  generally 
drift  balanced,  if  we  impose  an  additional  constraint  on  the 
random  stimuli  to  be  summed  we  can  ensure  that  the  resul* 
tant  linear  combination  is  indeed  drift  balanced. 

The  following  lemma  bears  on  this  issue. 

Lemma  1 

Let  5  be  a  random  stimulus  equal  to  the  sum  of  a  set  0  of 
pairwise  independent  random  stimuli;  then 

un 

where  N/  •  7  -  E/  for  each  I  e  tt. 

Proof 

See  Appendix  B. 

Immediately  we  note  a  useful  result  concerning  linear  com¬ 
binations  of  drift-balanced  random  stimuli: 


Proposition  3 

Let  (1  ■  9  o  17]  be  a  set  of  pairwise  independent,  drift- 
balanced  random  stimuli,  such  that  /  is  invariant  and  each 
member  of  9  has  an  expectation  of  0.  Then  any  linear  combi¬ 
nation,  S,  of  the  elements  of  U  is  drift  balanced. 


Proof  < 

A  drift-balanced  random  stimulus  rescaled  by  a  constant  is 
drift  balanced.  Thus  we  assume  with  no  loss  of  generality 
that  5  is  just  a  sum  of  pairwise  independent  drift-balanced 
random  stimuli. 

Note  that  (i)  7  ■  £;  (hence  N/ »  7  -  £/  *  0)  and  (ii)  for  all  J 
€  0,  N;  *  «7  —  Ej  «  J,  Thus  from  lemma  1  we  observe  for 
any«,^,T€  E 
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Note,  m  pcrtkulir,  that  this  resuSt  bolds  fcr  /  *  t.~ 

As  is  reasonably  dear  from  propodtioo  3  (poet  sp^ct^ 
time'Sepaiable  random  stimuli  are  drift  balanced),  toy  ssm 
of  pairwi^  independent,  ^Moe-tiine'Separable  random 
stimuli,  tU  irith  an  expectation  of  f « is  drift  balanced.  In 
coroUao’  I  this  prindple  is  applied  to  (eoerate  a  dass  of 
drift-bdanced  randran  stimuli,  certain  instances  of  idndt 
exhibit  stroni;.  conristent  apparent  motion  in  a  fixed  direc¬ 
tion. 

Corollary  1 

For  M  €  let  ^  di . be  painrise  independent 

random  variables,  ttA  with  expectation  0;  and.  for  /n  *  0. 
1, ,  Af  -  l,  let/«3?  —  R  and  ^*3—  B.  and  let  the 
product Ik  0  at  all  but  finitely  many  pdnts  of  then 
the  random  stimulus  7  deHned  fay  setting 

n^.y.  <1  ”  X 

M*0 

is  drift  balanced. 

The  proof  t$  obrious  from  propositions  1  and  3. 

A  simple  yet  compellini  counterexample  to  the  MFFC 
principle  may  now  be  constructed  as  follow's. 

Demonstration  1:  A  Randomly  Contrasi«Re\  erslng. 
Rightward'SleppIrig  Rectangle 

For  some  Af  €  Z*,  let  the  random  stimulus  7  be  composed  of 
Af  frame  blocks  indexed  0, 1, ....  Af  - 1,  and  let  ea^  frame 
block  be  composed  of  a  horizontal  sequence  of  Af  rectangles 
indexed  0, 1. ...  .Af  - 1  from  left  to  right  (see  example  2  and 
Fig.  3).  Let  ^  di. « •  •  t  be  pairwise  independent  ran- 
dom  variables,  each  taking  the  value  C  or  -C  with  equal 
probability.  Give  rectangle  i  in  frame  block  i  the  \alue 
assumed  by  and  give  all  other  pixels  the  value  0.  We  can 
now  define  7  by  Eq.  (3)  by  letting /M(z.y)  take  the  value  1  In 
the  mth  rectangle  and  0  everywhere  el^  and  letting  gn(t) 
take  the  value  1  in  the  mth  frame  block  and  0  es-eryAhere 
else. 

The  apparent  motion  of  this  stimulus  is  quite  easy  to 
imagine:  throughout  frame  block  0,  rectangle  0  is  present 
on  the  left'hand  side  of  the  stimulus  field,  it  is  assigned 
contrast  of  C  or  -C  with  equal  probi.bIlily.  In  frame  block 
1,  rectangle  0  turns  off  (goes  to  contrast  b),  and  rectangle  1, 
abutting  rectangle  0  on  the  right,  turns  on,  again  with  con¬ 
trast  C  or  "C  assigned  with  equal  probability,  independent 
of  the  contrast  of  the  Hrst  rectangle.  In  each  successive 
frame  block,  one  rectangle  turns  off,  and  a  new  rectangle 
turns  on  directly  to  the  right  of  its  predecessor,  with  contrast 
either  C  or  -C,  independent  of  any  other  rectangle. 
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cm  X  33  cxa.  and  di^Iaytd  Btfoicitao  were  peensib  vbtte, 
Tbe spatial RMbitwn was 512 X 512 pods, tbe ten  -ml 
tttohitioc  was  60  framcs/sec,  and  the  intensity  rwofctioo 
««s2S6gr^Ieveb. 

*1^  aohjecta  wee  Bvohvd  in  aacb  of  tba  stodies:  CC 
(tbe'eqierxmefite)  iuid  DY  (a  sane  sohject).  For  each 
dcaeo^atSco.  oedr  aobiect  Viewed  30  jadependcptrealaa- 
dons  of  tbe  random  stimoltts.  Os  eadi  preseaUtlon,  the 
sos'Focrier  motion  of  the  stimulus  G.X.J.  H,  or  C)  was  left 
to  right  or  r^t  to  left  with  equal  probebOhy.  F^isstasce. 
Fa  resdoody  cootrast'revtnl:^  rectangle  stepped  left  to 
lighter  right  to  left  with  equal  probal^ty. 

Sub)t^  adapted  before  eadi  sessim  to  a  ucifom  screes 
of  luminance  ^  cd/m^  other  himinaaces  were  linearized 
carefully  rdathe  to  the  mean.  All  stimuli  were  viewed  fo> 
\edlyandbbocu!arly,fromadUtanceof2m.  Oneaditrul. 
a  central  cue  spot  (0.5  d^  X  0.5  d^)  of  low  posithre  contrast 
came  on  2  sec  before  the  onset  of  the  stimulus  and  disap' 
pearedl  sec  before  the  cmseL  Subjects  were  instructed  to 
maintain  their  gazes  throughout  the  trial  on  the  cue  spot 
point  and  were  required  to  indicate  the  predominant  direc- 
tton  of  apparent  motion  (left  or  right)  h>’  entering  either  an  L 
or  an  R  on  a  terminal  keyboard. 

Method  for  Demonstrotion  1 

In  tbe  verrion  of  7  viewed  by  our  subjects,  frame  blocks 
lasted  l/60sec;spatia]  rectangles  measured  approximately  2 
deg  (borizontaDX  2  deg  (vertical)  and  C»  0.25.  Tbe  con¬ 
trast  of  0.25  was  chosen  because  tt  produced  easily  visible 
motion  and  yet  was  small  enough  that  psychophysical,  as 
well  as  physical,  equivalence  of  positive  and  negative  inac- 
ments  was  likely  to  hold. 

Results 

Subject  CC  (DV)  reported  apparent  motion  in  the  step  di 
rection  on  30  (20)  of  30  triads. 

Discussk>n 

Tbe  essential  trick  of  the  rightward  stepping  bar  was  to 
modulate  the  contrast  (that  is,  the  absolute  deviation  from 
zero)  of  a  field  of  static,  spatially  independent,  zero-mean 
noise  as  a  function  of  space  and  time.  This  notion  of  spatio 
temporal  modulation  of  contrast  needs  some  explanation 
Let  J  be  a  random  stimulus  with  expectation  0,  let  IF  be  a 
nonnegative  function  of  Z*  (space  and  time),  and  consider  7 
■  IVJ.  In  general.  J's  distance  from  0,  ^  it  positive  or 
negative,  is  magnified  (or  damped)  by  W's  value  at  each 
point  in  space  and  time.  Thus  7  is  obtained  by  letting  W 
modulate  the  (absolute)  contrast  of  J. 
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Ftf.  4.  «.  Ri|;bt««rd'«:«ppeac.  raadcoly  oostrsst'rmmi^  verticfti  bar  «  boraost*Ut«aporal  cross  section  of  a  realization  of  the  random 
atimidus/Cseedeaocstrationl).  /tsthesuiaofpaii«iseiadepei)deQt$paee-tiffle-separablerai>domstiauIi.eachofirhicbhasanezpe<1aUon 
ofO;ooQse4oeat]>7tsdnftbalanoed(byco7o!laryl)-  b.Mcdulatioooft^oontrast^astaUcooi$ene1db>'adnftin(sinus<»dalsratu^  abor- 
izantal-tesporal  ctocs  sectioa  of  a  reaUzation  of  the  random  stimulus  K  (demoostratiop  2).  Tbat  K  is  d^t  balanced  follows  from  corollary  1. 
c;  IVat^Iis;  contrast  res*ersa]  of  a  ocae  field,  a  borizonul-temporal  cross  sectim  of  a  realization  of  tbe  random  stimulus/  (demonstration  3). 
Jss  the  sum  of  pairwise  independent  space-time*separable  random  stimuli,  cacb  of  winds  has  an  expectation  of  0  and  is  thus  drift  balanced  (1^ 
coroUaryl)  N*otethat,incontxasttol/}(for7cfFi{  4a),  U1  is  des*oid  of  motion  information.  dtModulaUonofthenJckerfrequencyofaflick* 
erins  isMse  field  by  a  dnf(in{  (ratinr  a  hcvizontal-temporal  cross  section  of  a  realizatsoo  of  the  random  stimulus  H  (demonstration  4).  That 
H  is  drift  balanced  b  a  consequence  of  corolUry  Cl  (in  Appendix  C).  Ihe  mottos  cSHn  derived  from  spattotemporal  modulation  of  the 
frequency  of  ^us^al  flicker,  where  the  phase  of  the  flicker  b  random  over  space,  e.  Modulation  of  the  contrast  of  a  flickenng  noue  field  by  a 
drifting  sinusoidal  grating’  a  borizontal-teoporal  ooss  section  of  a  realizatkui  <4  the  random  stimulus  C  (demonstration  S)  C  is  dnfl 
balanced  (by  coroU^*  Cl).  The  motion  of  <7  b  derived  from  spatiotemporal  modulation  of  the  amplitude  of  sinusoidal  flicker,  where  the 
flicker  ph^  b  random  over  space. 


To  see  how  thb  notion  applies  to  /  of  demonstration  1, 
note  tbat  «e  can  look  at  Its  result  of  ciultiplyrnga  field  J 
of  random  black  or  white  rectangles  persisting  through  M 
chunks  of  time  by  a  function  W,  which  (for  m  ■  0, M  ^ 
1)  b  1  in  the  mth  frame  block  for  the  points  in  the  mth 
rectangle  from  the  left  and  0  everywhere  else. 

Elaborations  of  this  basic  contrast-modulation  scheme  are 
easy  to  construct  (insider,  for  instance,  demonstration  2. 

Demonstration  2:  Contrast  Modulation  of  a  Static  Noise 
Field  by  a  Drifting  Sinusoid 

We  compose  the  random  stimulus  if  of  N  frame  blocks,  each 
containing  a  horizontal  row  of  rectangles,  indexed  0, 1, , 
Af- 1  from  left  to  right  Form  *0. ...  .Af-  l,Iet/«(x,yj 
take  the  value  1  in  the  mth  rectangle  and  0  elsewhere,  and  let 


vary  as  a  sinusoidal  function  of  m  and  the  frame  block. 
Speciflcdly,  for  each  frame  t  in  the  nth  frame  block,  let 
cosl2T(am/Af  ~  0n/tf)]  +  1 

■  2 

for  some  spatial  and  temporal  frequencies  o  and  0.  Let  ^ 
^1,  i>c  pJrwise  independent  random  variables 

taking  the  values  C  and  with  equal  probability,  for  some 
contrast  C,  and  define  K  by  Eq.  (3). 

Whereas  I  of  demonstration  1  merely  picks  out  successive 
rectangles  of  spatial  noise  (independently  assigned  ^ntrast 
C  or  •Q  in  successive  time  intervals,  K  is  marked  by  high- 
power  crests  (a  per  frame  block)  separated  by  zero-power 
(gray)  troughs  sweeoing  at  a  constant  rate  from  left  to  right 
over  the  row  of  rectangles,  each  of  random  contrast  C  or  -C. 
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Method. 

In  the  venioD  cC  X  viewed  by  oar  sabjec^  frcDe  Uocb 
listed  1/60  Mc,  reetaocJes  metsored  ^>proxxiDatd>*  (1/8 
bonzbot^  1^  C2  de(  VeitkaD,  Jipd  coQtr^  C  *  oj2S. 

Besabs 

Tbe  come  (ntm(  modulituig  the  eootiart  of  X  ms  n^t> 
mrd  or  kftmrd  drifUiig;  irith  equal  probelnltty.  Subject 
(X?  (DY)  reported  ^>pirest  moCioa  m  the  direetiM  of  drift 
m  30  (26)  of  30  trials. 

It  be  that  faumans  extract  the  iDotton  from  stimuli 

so^  as  J  (F%.  4a)  az>d  K  (Fie.  4b)  rimply  by  perfmnise  * 
Fourier  poarer  asa^^  do  a  recUf}^  version  of  the  stimulus. 
For  instance,  if  subjects  aere  able  either  G)  to  disregard  (^ 
to  0)  all  n^athre  contrast  values  or  GO  to  map  all  contrasts 
onto  tbrir  absolute  values,  then  it  is  dtu  that  a  Fourier 
power  analyris  of  the  resultant  rectified  signal  would  eorre' 
spend  quite  well  to  percrit*ed  motion.  This  explanation 
does  not  account  for  responses  to  stimuli  of  the  type  consid* 
ered  in  denonstrarion  3. 

Demonstrations:  Traveling  Contrast  Reversal  of  a 
Random  Bar  Pattern 

LetAff  Z*.  Wecot4<tructtherandom;itimu]usJofAf'f  1. 
frame  blocks  indexed  0,1, ...  ,Af,eaclt  of  which  contains  Af 
rectangles  indexed  0, 1, .Af-lfrcm  left  to  right.  Let 
/*b«  >1  lAke  the  vtdue  1  in  the  'mtli  rectangle  and  zero 
el^herr,  Ietgn(t]  be  1  in  frame  blocks  0  through  m,  -1  in 
frame  blocks  m  4- 1  through  Af,  and  6  e^etywhere  else.  Let 
the  randorn  variables  doi  di*  •  •  • « be  pairwise  indepen* 
dent,  each  taking  a  contrast  value  of  C  or  with  equal 
prolttbility,  and  use  Eq.  (3)  to  define  J,  ^ 

In  frame  block  0  of «/,  all  Af  rectangles  turn  on,  some  with 
contrast  C  and  others  with  contrast  -'C,  In  successive  frame 
blocks  m  ■  1,  2,  ... ,  Af,  exactly  one  of  the  rectangles 
changes  contrast:  the  (m  —  l)lh  switches  to  C  if  its  previous 
contrast  was  -C;  otherwise  it  flips  from  C  to  -C.  In  frame 
block  1,  the  leftmost  (0th)  rectangle  flips  contrast;  in  frame 
block  2,  rectangle  1  flips,  and  in  successive  frame  blocks, 
successive  rectangles  flip  contrast  from  left  to  right,  until  the 
(.\f  ->  l)th  rectangle  flips  in  frame  block  Af,  after  which  all 
the  rectangles  turn  off. 
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Sp«la]  Kobe  iridi «  Drifti^  StouMid 
WeriiaD  construct  tbezaadoostiimilisHofAf  frame  Idods 
jpdexed  fr,  1,  —  L  each  cqrapoaed  of  M  xertargiles 

adexcdO,L>'>«i^~lfioailefttori^  Let^Pi,..., 
bepairwiae  xndepei^cntTai^oin  variables,  eadb  uni' 
foraly  distriboted  oo  [-Y,  t).  Let  Cbe^a' contrast  talue. 
.For an  (r.y,t)  e  Z^tei 

for  m  todexiDg  tbe  rectaD^  cootaining  (x,  y)  and  R  mdexing 
the  frame  blodcmtai^iiigt.  Tbedemoesfration  that  His 
drifr  balanced  it  pven  m  Aiqiendtx'CL 
ArealizatloDofH,iritbH*  32andAf  « 128,tt8bowDin 
F%.  4d.  hk  frame  Mod:  0,  the  rectan^es  are  atoned  ran- 
dma  contrasts  betweA  C  and  — (py  a  ronsegufiKe  of  their 
independent,  random  phases).  Thereafter,  for  m  «  0, 1. 
...  .Af-—  1,  the  contrast  of  the  mtb  rectangle  is  modulated 
by  a  cosine  whose  phase  is  itself  a  function  of  the 

rectangle  and  the  frame  block.  Since,  however,  a  sinusoid*s 
frequenQT  ts  the  derivative  of  its  phase  (and  since  the  deriva¬ 
tive  of  a  sinusoid  is  a  sinusoid  of  the  same  frequency),  we 
nbserv'e  that  H  modulates,  with  a  drifting  sinusoid,  the  fre¬ 
quent  of  (spatially  random-phased)  sinusoidal  flicker. 

•  In  demonstration  4  the  contrast  oscUlaGon  rate  of  each 
rectangle  speeds  up  and  slows  down  sinusoidally  throughout 
the  presentation.  Regions  of  equal  osdllation  rate  (crests  of 
rapid  sinusoidal  flicker  separate  by  troughs  of  slow  modu- 
tatian)  sweep  at  a  constant  rate  from  left  to  right  across  the 
viewing  field 

Method 

The  conditions  under  which  H  was  presented  to  subjects  CC 
artd  DY  were  the  same  as  those  governing  the  display  of  K  (of 
demonstration  2).  Each  frame  block  lasted  1/60  sec,  each 
spatial  rectangle  measured  2  deg  (vertical)  X 1/8  deg  (hori¬ 
zontal),  and  the  contrast  C  >■  0.25. 

Aesu/ts 

Interestingly,  despite  the  striking  diagonal  contours  mark¬ 
ing  the  (x,y)  pattern  of  Fig.  4d,  both  subjects  reported  that 
the  motion  of  H  was  generally  more  ambiguous  than  those  of 
theother stimuli  CC  (DY)  reported  apparentmotion  in  the 
drift  direction  of  the  sinusoid  modulating  frequenQ’  of  con¬ 
trast  osciiation  on  28  (23)  of  30  trials. 


Method 

The  version  of  J  viewed  by  subjects  CC  and  DY  contained 
nine  frame  blocks,  each  of  which  lasted  1/60  sec  and  con¬ 
tained  eight  spatial  rectangles,  each  measuring  approxi¬ 
mately  2  deg  X  2  deg;  C  ■  0  25. 

Results 

CC  (DY)  reported  apparent  motion  in  the  direction  traveled 
by  the  contrast  flip  in  30  (25)  of  30  trials. 

The  next  two  stimuli  (G  of  demonstration  4  and  H  of 
demonstration  5)  are  both  drift  balanced.  The  proof  of  this 
fact  depends  on  a  corollary  to  proposition  3  that  is  otherwise 
unimportant.  We  relegate  this  corollary  to  Appendix  C  and 
show  there  how  it  can  be  applied  to  construct  each  of  G  and 
H. 


Demonstration  S:  Modulating  the  Contrast  of  Flickering 

Noise  with  a  Drifting  Sinusoid 

Ibe  rrmdom  stimulus  (7  is  made  up  of  N  frame  blocks 

dexedO,! . 1,  each  containing  Af  rectangles  indexed 

0, 1, . . . ,  Af  - 1  from  left  to  right.  Let  po.  Pi . .  be 

pairwise  independent  random  variables,  each  uniformly  dis¬ 
tributed  on  |-T, »),  Let  C  be  some  contrast  value,  then,  for 
any(j^»^»/)f  Z^set 


C{x,y,t] 


where  m  indexes  the  rectangle  containing  (x,  y)  and  n  index- 


k- 
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me  j.c^S0e.j^Afv<iL^iio.iijS(mbam8 

t^thebaathhtkcoaUhuB^L  Tbe  proof  thstC  it  dxift 
biJtoced  it  pTCD  m  ^iipeodix  CL 
'A  reol^tion  of  G  rritb  if  *  N  *  32;  ff  «  ^  «  2,  aad  Y  * 

3  it  tbown  is  Tif.  4ti  As  do»K  c(  dmoostnSioa  2,  G 
Ceoottts  iU  ^jptrent  iDotkb  by  stodulstiDf  cxmtnst  St  a 
dnftiDC  sinusoidal  funrtion  of  ^  rectso^  tbd  the  frame 
UocL '  However*  iidieretf  the  bocipouDd  iiiiose  cobtrart  It 
bdz%  modulated  in  K  is  n  static  Toir  of  recUs^  randoti^ 
painted  C(V  tC*  the  baciigrbundjidiote  poiva  is  modulated 

in  /?  k  ■  mm  nf  »#rtaiigU*  ^iwa^aTly  h^tro^n  /? 

and  eadi  rectan^  m  has  aTrandomly  mtigDed  phase 
(Pa)  and  it  f^dUnog  at  the  rate  of  3/32  cydts/tunt  block 
(as  a  consequence  of  the  tenn  2r  W32). 

Ihe  contrast  of  (Ts  fHdcerxns  rectanfle  row  is  modulated  - 
by  the  factor 


which  sweeps  peaks  (two  per  frame)  of  hi^<ontra$t  flicker 
separated  1^  troughs  of  mean  gray  across  the  viewing  field 
from  left  to  ri,*ht 

Afethod 

The  conditions  gove**nmg  the  display  of  G  were  the  same  as 
those  for  K  (and  H):  Frame  blocks  lasted  1/60  sec,  spatial 
rectangles  measured  2  deg  (vertical)  X 1/8  deg  (horiiontal), 
andC-0.25. 


Results 

CC  (DY)  registered  apparent  motion  in  the  drift  direction  of 
the  sinusoid  modulating  noise  contrast  in  (7  on  30  (26)  of  30 
trials. 

(Conclusions 

In  this  section  we  have  demonstrated  five  drift-balanced 
random  stimuli  whose  apparent  motion  is  perceived  in  one 
consistent  direction  in  more  than  90%  of  trials  by  two  ob¬ 
servers.  Indeed,  many  other  observers  have  viewed  these 
stimuli,  and  no  one  has  yet  failed  to  perceive  their  consistent 
motion.  As  is  discussed  in  Section  8  below,  these  stimuli  are 
microbalanced  in  addition  to  being  drift  balanced;  that  is, 
they  remain  drift  balanced  after  windowing  by  arbitrary 
space-time-separable  functions.  We  conclude  that  there  is 
a  large  class  of  random  stimuli  whose  apparent  motion  con¬ 
tradicts  the  MFFC  principle  of  motion  perception. 

There  are  many  kinds  of  drift-balanced  and  microba¬ 
lanced  random  stimuli  that  were  not  represented  among  the 
demonstrations  described  here.  In  this  paper  we  have  re¬ 
stricted  ourselves  to  stimuli  that  assign  constant  values  in 
the  vertical  dimension  of  space.  Dropping  this  constraint 
opens  the  door  to  a  broad  range  of  other  drift-balanced  and 
microbalanced  random  stimuli.  In  particular,  a  large  class 
of  display’s  that  yield  apparent  motion  is  generated  by  defin¬ 
ing  two  spatiotemporal  texture  fields,  A  end  B,  at  each  point 
(r,  y,  t)  €  and  moving  a  boundary  that  admits  light  only 
from  Held  A  on  one  side  and  only  from  B  on  the  other.  Many 
instances  of  this  kind  of  apparent  motion,  including  those 
proposed  by  Victor,**  can  easily  be  shown  to  be  micro- 
balanced.'* 


7,  B£!(31A]U>T4)EIX(nt>R 
CHARACTERIZATION  OF  DRIFT-BALANCED 
RANDOM  STTMOU 

A  poist-del^*  Resdiardt  detector  is  a  wple  device  that  was 
proposed  origusaDy  by  Rnchardt^  to  ei^Uxo  the  vwon  of 
beetles.  Its  baric  priodpl^^atitocor^atiottcd  inputs  at 
]ieari)yvisua)locatioos,UD^UesmostoftbecurTeBt]ypre- 
domloastnKdds (Shuman motion perceptkni.  Wedefine 
the  Reicha^t  detector  in  terms  of  two  subunits,  designated 
for  convenience  as  tbe  left  and  right  hslf-detectors.  Both 
hslf-detecton  axe  defined  with  respect  to  the  same  two'(Q>s- 
tial)  lotttions  (r»  y)  and  (p,q)  in  2?  and  for  some  fixed 
nonnegative  number  of  frames.  These  opporite]yorieot> 
ed  detectors  are  pitted  additively  against  ea^  other,  Aleft 
half-detector  r>ft  [implicitly  indexed  by  (x,  y),  (p,  9),  and  8/] 
compute  the  cova^ce  over  time  of  the  contrast  at  point 
(r,  y)  et  time  t  with  the  contrast  at  pmnt  (p,  q)  at  time  t  — Si 
throughoutthedisplayofanaxbitrarystimulus/.  Forrntau 
t  and  are  reversed  Tbe  computation  performed  by  r  is 
given  by 

HD  «  “  2!  /(i.  y.  ‘UIp.  «.!-«,) 

t€Z 

uz 

When  HD  <  0,  it  indicates  motion  from  (x,  y)  to  (p,  <?). 

Figure  5  illustrates  a  block-diagram  representation  of  the 
Reicbardt  half-detectors  and  the  Reicliardt  full  detector. 
Tbe  box  containing  (x,y)  (respectively,  (p,  9))  is  a  contrast 
gauge,  inputting  the  contrast  at  point  (x,  y)  [(p,  9))  for  each 
successive  frame  r.  Each  of  the  l^xes  containing  is  a  delay 
filter.  At  frame  t,  each  delay  box  outputs  the  value  entered 
into  itat  frame  t  ~  Si.  Each  of  the  boxes  marked  with  an  X 
outputs  the  product  of  its  two  inputs  at  any  frame  t.  Each  of 
the  boxes  marked  with  a  ^  accumulates  the  output  from  the 
multipliers  over  all  the  frames.  Finally,  the  box  marked 
with  a  -  outputs  the  difference  of  its  inputs  at  any  frame  f. 

To  see  how  the  detector  shown  in  Fig.  5c  works,  consider  a 
point  of  light  moving  across  a  dark  visual  field  $0  as  to  cross 
first  (x,y)  and  then  (p,  9).  If  the  spot  is  moving  at  the 
proper  rate  (so  that  it  starts  crossing  (p,  9)  after  precisely  St 
frames],  then  tbe  output  from  the  right-hand  multiplier  will 
be  high  as  the  dot  passes  over  (p,  9).  In  contrast,  the  output 
from  the  left-hand  multiplier  will  be  low  throughout  the 
presentation  of  the  moving  dot,  since,  at  any  frame,  at  least 
one  of  its  input  channels  is  contributing  a  value  near  zero. 
Hius  the  output  of  tbe  detector  is  negative.  On  the  other 
hand,  if  the  dot  passes  first  over  (p,  9)  and  then  over  (x,  y), 
the  detector’s  response  is  positive.  In  this  simple  case,  the 
sign  of  the  detector’s  output  does  a  good  job  of  signaling  the 
direction  of  the  dot’s  motion. 

However,  the  point-delay  Reichardt  detector  is  high)/  vul¬ 
nerable  to  aliasing.  Imagine  a  train  of  evenly  spaced  dots 
passing  at  some  speed  s  first  over  (x,  y)  and  then  over  (p,  9). 
For  any  s.  it  is  easy  to  adjust  the  spacing  between  dots  so  that 
the  output  of  the  Reichardt  detector  of  Fig.  5c  signals  right- 
ward  motion,  leftward  motion,  or  no  motion  at  all. 

Despite  the  shortcomings  of  the  simple  Reichardt  detcc- 
Ur,  there  is  something  appealing  about  its  fundamental  au- 
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Fig.  &  pMDt'delny  Reich&idt  detector  And  its  component  hslf' 
detectors,  t,  The  right  bilf>detector  computes  the  coMriAnce  of 
the  contrast  (luctustions  of  the  input  stimulus  at  point  (p,  4)  with 
the  fluctustions  it  frames  earlier  at  point  (x,  >):  (x,  y)  and  (p,  9) 
ter  signal  contrast  frame  by  frame.  The  contrast  of  the  current 
frame  at  pixel  (p.  9)  is  multiplied  by  the  contrast  at  pixel  (x.  y)  ii 
frames  in  the  past.  (Thebox  labeled  i,  outputs  the  input  it  recei>ed 
frames  ago.)  The  output  from  the  multiplier  b  accumulated  o\’er 
all  the  frames  of  the  display,  b,  In  a  simJar  fashion,  the  left  half* 
detector  computes  the  cosariance  of  the  contrast  fluctuations  of  the 
input  stimulus  at  point  (z,  y)  with  the  fluctuations  frames  earlier 
at  point  (p,^).  c.  The  full  point«deIay  Reichardl  detector  outputs 
the  difference  between  the  left  and  right  half*detectors.  A  positive 
response  thus  signals  leftward  motion;  a  negative  response  signals 
rightward  motion. 

tocorrelation  principle.  Various  elaborations  of  Reichardt 
models  nere  developed  and  studied  in  detail  by  van  Santen 
and  Sperling,*-*^*  who  proved  that  the  apparently  different 
models  of  Adelson  and  Bergen^  and  Watson  and  Ahumada^ 
w  ere  essentially  special  types  of  elaborated  Reichardt  detec¬ 
tors  (ERD's).  All  these  models  retain  the  basic  delay-and* 
compare  structure  of  the  simple  detector  diagrammed  !n  Fig. 
5c.  However,  this  simple  detector  is  generalized  in  the  fol¬ 
lowing  ways*  (i)  the  point  detectors  at  (x,  y)  and  (p,  q)  are 
replaced  by  spatial  receptive  fields  (that  is,  each  receptive 


fiald  appUea  an  amy  of  wtSt^is  to  tbe  atimulus  impispng 
upon  iu  repoa  of  tlse  retina.  az;d  it  qutputa  the  sum  of  tlse 
weighted  contrast  values),  (u)  the  iempml  p<wt 
before  ^  m^tlpliers  sre  replaced  by  temporal  niters,  and 
(iiO  the  temportd  accumulators  afttf  the  ffluH^Iiers  are  re> 
placed  by  t^]Mr^  filters.  Van  Santen  and  Sperling^ 
showed  that  further  additiems  (e%.,  mcM  temporal  filters 
add^  here  and  there)  do  not  augxpentthe  capabilities  of  this 

erd:,  \  . 

It  was  widely  assuihed  t^t,  ideally,  a  good  motion  detec¬ 
tor  should  b^ve  as  a  frequent-domain  power  analyz- 
(This  is  the  assumption  called  into  question  by 
the  demonstration  of  good  apparent  motion  in  drift-bal¬ 
anced  stunulL)  The  simple  point-delay  Reichardt  detector 
falls  short  of  this  ideal:  it  is  not  a  go^  Fourier  analyzer. 
The  various  elaborations  of  Reichardt  detectors  can  be 
viewed  as  attempts  to  improve  their  performance  as  fre¬ 
quency-domain  power  an^yzers. 

There  is  another  way  to  use  the  Reichardt  mechanism  as 
the  basis  of  a  motion-perception  model.  Indeed,  as  we  shall 
observe,  it  is  possible  to  bt^d  a  perfect  Fourier  power  ana¬ 
lyzer  by  using  only  the  simplest  point-delay  half-detectors. 

Our  main  purpose  in  thb  section,  however,  is  to  proWde  an 
alternative  characterization  of  the  class  of  drift-balanced 
random  stimuli,  in  terms  of  the  expected  responses  of  point- 
delay  Reidiaidt  detectors  to  members  of  this  class.  We 
prove  the  following  proposition:  For  any  integers  Ig,  5^  and 
iit  form  the  class  of  all  point-delay  Reichardt  detec¬ 
tors  conforming  to  Fig.  5c  (with  (x,  y)  and  (p,  ranging 
throughout  2?]  such  that  (x,  y)  -*  (p,  <?)  **  (ij,  and  call 
C$,^  trivial  if  either  i,)  ■  (0, 0)  or  «  0;  that  is, 

is  trivial  if  its  member  detectors  fail  to  separate,  either  in 
$pac«  or  time,  the  points  whose  contrast  they  compare.  I  is 
then  drift  balanced  iff  the  expected  pooled  response  of  ev  ery 
nontrivial  class  of  point-delay  Reichardt  detectors  is  0.  We 
now  proceed  more  formally. 

DenniUon  4:  Autocorrelation 

Let/beaiandom stimulus.  Thenforany 5  » e 
define  the  autoccrrelathn,  H/,  by 

H;|>„  iy  M  "  2:  i|x,  y,  tl/[p,  q,  rl, 

where  thesum  is  taken  over  all  pairs  (x,y,  t),  (p,  q,  r)  e  for 
which  (x,  y,  f)  -  (p,  q,  r)  •  (5„  i,;.  Define  the  full- 

detector  pooler,  R/,  by  setting 

V  -V  M- 

We  use  Hr  to  denote  the  autocorrelation  of  /  because,  for 
any  5^  4<),  iy  j|)  collects  the  sum  of  the  responses 
to  /  of  all  the  half-detectors  conforming  to  Fig.  5b,  with 
delsy  filters,  such  that  (x,  y)  -  (p,  ?)  ■  (i„  5^).  The  half- 
detectors  corresponding  to  Fig.  5a  are  poolc  I  by  -5^  ii). 

Thus  iy  ^<1  pools  the  output  of  all  full  Reichardt 
detectors  corresponding  to  Pig.  5c,with(x,y)  -  (p,fl)  ■ 
ij)  (and  delay  filters). 

Observation  2 

Fr '  any  random  (or  nonrandom)  stimulus  J  and  any  6  ■  (}j, 
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TbeproofmtrinaL 

Id  wder  to  redaini  Fourier  motion  information  from  the 
balf-detector  output,  note  first  that,  for  any  random  stimu¬ 
lus/, 

|/(w,  8,  t)P  ■  £  I[x,  y.  t\I\p,  9,  r] 

X  expO'(<^(a:  ”p)  +  f(y  “  fl)  +  Ht  -  r))), 

(4) 

where  the  sum  is  taken  over  all  (x,y,(),(p,  9,  r)  e  7?.  Wt 
can  now  collect  terms  of  the  sum  in  Eq.  (4)  that  have  idenU- 
cal  exponential  factors  to  obtain 

8,  r)F  -  E  8^  8jexpO'(«8,  +  88,  +  t8,)).  (5) 

where  the  sum  is  over  all  (82,  ^  8j)  e  7?. 

Equation  (5)  shows  that  point'delay  half-detectors,  by 
themselves,  contain  all  the  i^ormation  about  the  distribu¬ 
tion  of  Fa  power  in  the  Fourier  domain  (because  H;  depends 
on  only  the  output  of  half-detectors  to  /). 

The  next  definition  is  useful  for  proving  the  main  result  of 
this  section. 

Definition  5:  Power  Difference  between  Oppositely 

Drifting  Fourier  Components 

For  any  random  stimulus  /  and  any  8,  r  €  Iff.  set 

8,  t)  ■  8,  t)1*  - 17(«,  8,  -t)F. 

Note  that  any  random  stimulus  7  is  drift  balanced  iff 
£|  8.  r)] »  0  for  al)  ,  8.  r  e  (0, 2r}.  Some  facts  about  A| 

are  worth  noting  First, 

A/(«.  8.  r) «  8^  8J  -  H;l8..  8^  -8,)) 

X  exp0(«^i  +  88,  +  t8,)) 

-2:(H;I8..8,,8J-H;I-8,.-8,.8J) 

X  exp0'(v8j[  +  88,  +  t8^)) 

-  Z  K/I8..  iy  8jexp0M  +  W,  +  y8,)), 

where  each  sum  is  over  all  (8,,  8,,  8()€  Z\  The  first  identity 
depends  on  the  fact  that 

I7(«,  8,  -t)1^  -  E  Hjl8„  8,,  8jexpO((..'8,  +  85,  -  r8,>) 

-  Z  H,|8„  iy  -8,Iexp0(u'8,  +  85,  +  t8,)). 

The  second  identity  follow's  from  observation  2. 

Next  note  that  any  term 

(H;|8j„  &y  8,)  -  H;I8,.  by  -8,])exp0'(«8^  85,  +  t5^)) 

in  the  sum  yielding  A/(w,  8,  t)  is  obviously  0  if  5<  ■  0.  On  the 
other  hand,  this  term  is  equal  (by  observation  2)  to 

(Hil5,.  5,,  5,1  -  HjI-5„  -iy  5,l)exp0'((^,  +  85,  +  t5,)), 

which  IS  evidently  0  if  5,  ■  5,  ■  0.  This  goes  to  show  that  for 
any  5j,  5,,  5^  e  Z,  any  class  of  Reichardt  half-detectors,  each 
of  whose  members  has  (i)  no  separation  between  spatial 
receptors  or  (ii)  a  delay  factor  of  0,  does  not  influence 
Aiftf,  8,  r). 

The  following  lemma  summarizes  these  observations 


Lemfflti  ' 

For  aooy  random  stimulus/,  any  tf,  8,  r  €  R, 

t)  -  E  V  y 

where  the  sum  is  taken  over  all  mtegers  ij,  Sy,  and  St  sudi 
that8(^OaDd  either  8^  ^  Odr^^  0. 

0^'iously,if 

f^  aU  ^and  8/  indexing  the  sum  in  Eq.  (6),  then  8, 

r)«0.  This  proves  half  of  the  following  propodtios. 

Proposition  4 

A  random  stimulus  »  drift  balanced  iff  the  expected  pooled 
output  from  every  nontrivial  class  of  Reidiardt  detectors  is 
0;  that  ^  any  random  stimulus  /  tt  driff-balanced  iff 

for  all  int^ers  8x,  8y  and  St  such  that  8f  ^  0  and  (St,  8,)  9^ 

(0,0). 

Proof 

We  have  already  observed  that  Eq.  (7)  implies  that  7  is  drift 
balanced.  It  remains  to  be  proved  that  (7)  holds  when¬ 
ever  7  is  drift  balanced.  Accordingly,  let  Q  be  the  set  of  all 
(5i,  8,,  8()  for  which  8|  yi  Oand  (8^,8^)  (0, 0),  and  suppose 
that,  for  any  v!,6,r  e  [0, 2r), 

E[Aj(w,  8,  t)J  ■  0. 

When  we  take,  expectations  of  both  sides  of  Eq.  (6),  and 
multiply  each  side  of  the  resulting  identity  by  its  conjugate, 
we  obtain 

8.  r)|  -  E  miK  V  ^.lI^lR/1  V  V  ^/il 

X  expOitt'f^j "  V  +  8(5,-  8,)  +  t(5,  -  5,))),  (8) 

wberethesumisoveral)(5„5^,5i),(5p,5«,5r)6  Q.  However, 
recalling  that 

fo  t  lo 

"1  ~  expO‘8(5,- 5,))d8 

xj^  exp0’y(5,  -  5,))dr 

(2x)’  if5,-5p,  8,»5,.  5,-5,^ 

0  otherwise 

we  find  that  when  we  integrate  both  sides  of  Eq.  (8)  over  the 
interval  (0, 2x)*  and  divide  through  by  (2x)^  we  obtain 

Z  E»(R,(»,.  V  ».))  -  ^  ['  ['  [’  «■  OldwiM.. 

where  the  sum  is  over  all  (5„  5,,  5|)€  Q.  But  the  right-hand 
side  of  this  identity  is  0  by  assumption.  Thus,  since  each 
term  m  the  left-hand  sum  is  nonnegative,  each  must  be  0  | 

For  current  purposes,  the  importance  of  the  Reichardt- 
detector  characterization  of  the  class  of  drift-balanced  ran¬ 
dom  stimuli  (established  in  proposition  4)  is  that  it  provides 
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ea^-  acoK  to  the  princ^  xetah*  concOTinS  the  cntie^ 
mtvia**  of  diift-bAliond  nodcoD  *tiinuli  that  we  call  nu* 
crobalanocde  Tluaiatlie  focus  <^SectiGD8> 

a.  IkflCROBAlANCED  RANDOM  SmiUIJ 
Conuder  tbe  foUowioc  two‘&aiDe*blodc  ttiaulus^S:  In 
-fraseblock0,8l7ri^tspot(callitspot0}appea:r&  Infrave 
blodc  1,  ai^  0  diaa;q>eajs.  asd  two  2>ew  spots  appear,  one  on 
ead)  side  of  spot  Cb  On  tbe  cne  hand,  it  is  dear  (from 
pr(^>odtion4)tbat5tsdriftbalaDced.  Dn  tbe  other  hud,  it 
is  equally  dear  that  a  Fourier'based  motion  detector  whose 
^tial  reach  enc<»Dpassed  tbe  location  of  spot  0  and  only 
one  of  the  flashes  in  flame  block  1  m^ht  be  stimulated 
strongly  in  a  fixed  direction  by  S.  Although  5  is  drift  bal¬ 
anced,  some  local  Fourier  motion  detectors  wxmld  be  stimu¬ 
lated  strongly  and  ^tematlcally  by  5.  These  detectors  can 
be  selected  ^erentially  by  spatial  windowing,  and  thereby 
a  drift-balanced  stimulus  S  can  be  converted  into  a  non¬ 
drift-balanced  stimulus. 

In  this  section  we  introduce' the  class  of  miaobalanced 
random  stimuli,  a  subclass  of  drift-balanced  random  stimuli, 
any  member  /of  which  is  guaranteed  not  to  stimulate  Foiiri- 
er-power  motion  detectors  in  any  $>*stematic  way,  r^ardless 
of  any  space-time-separable  window  interposed  between  / 
and  the  detector.  As  we  shall  pro\  e  in  proposition  8  below,  / 
possesses  this  property  if  I  satisfies  the  following  definition. 

Definition  6:  Mlcrobalanced  Stimulus 

(^1  any  random  stimulus  /  microbalanced  iff,  for  any 

(x,y.(),(x'.y,t')e23. 

£(/Ix.y.  (l/(x'.y.  ('ll  -  E(/[x.y.  (1/(x',y,  (1|. 

Obviously,  for  any  random  spatial  function  f  and  temporal 
random  function  g, 

£(/I».>lsl<l/I»'.yisI<’))  •  £(/Ii.>'lsM/Ii'./i«(<)l. 

yielding  the  following  proposition. 

Propositions 

Any  space-time-separable  random  stimulus  is  microba¬ 
lanced. 

A  related  result  is  stated  in  the  next  proposition. 
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Ontbe  other  band,  if  (x,y}  (x',^asd£  {',raiznrariaiice 

and  sucrobalancedness  together  imply  that 

An  important  property  of  zmcrobalanced  nmdom  a^uli 
that  seta  them  apart  flom  the  more  gene;^  dass  of  drift- 
balanced  ludom  stimuli  is  explained  in  pn^xxution  7. 

Proposition? 

Tbe  product  of  independent  microbalanced  random  stimuli 
J  and  J  ia  microbalanc^. 

ftoof 

Forany  (x,y,(),(i',y',tO  e  Z** , 

-  Ein^.y,  t]Ilx',y,  tllEIJIi.y,  tVlz'.y’,  t^] 

-  £[;ii.  y,  m^.y.  ‘JMJir,  y.  tviy.y.  «ii 

-£I/JIx.y.t17JIx',y,«)I.  I 

Earlier  in  this  section  we  showed,  by  using  the  example  of 
a  single  spot  splitting  into  two  adiacent  spots,  that  a  drift- 
balanced  random  stimulus  (S)  can  systematically  stimulate 
motion  detectors  that  operate  on  restricted  regioris  of  S. 
With  proposition  8  we  shdl  establish  that  all  and  only  those 
random  stimuli  that  are  microbalanced  avoid  the  s^'stematic 
stimulation  of  all  local  (and  global)  Fourier-power  detectors. 
Tbe  following  lemma  eases  the  proof  of  this  important  fact 

Lemmas 

Any  microbalanced  random  stimulus  is  drift  balanced. 
Proof 

Let  /  be  microbalanced.  From  proposition  4,  /  is  drift  bal¬ 
anced  iff  •«  E[H;[5„  iy,  -j())  for  any  offset 

^  ^i)  €  Z^  such  that  {it,  i,)  (0,  0)  and  it  ^  0. 

However,  since  /  is  microbalanced,  we  note  that  for  any  such 

V  «.)1  -  y-  'l'l»  - y  -  V '  -  «<11 
-Z;£R(».y.<W»-«..y-V‘-yi 


Proposition  6 

Any  invariant  microbalanced  stimulus  /  is  space-time-sepa¬ 
rable. 

Proof 

If  /  ••  0,  there  is  nothing  to  prove  (since,  obviously,  0  is 
space-time  separable).  Otherwise  we  choose  a  point  (x',/,  (0 
6  2^  for  which  /|x',  /,  (']  0,  and,  for  all  (x,  y,  ()  e  Z\  we 

define 


/(x,y)-7|x.y.«'l 


and 


git) 


lU'.y.t] 


ilx'./.l'l 

If  either  (x,y)  ■  (x',y)  or  ( ■  (',  then  immediately  we  obtain 
/Ix,y,(l-/(x,y)g((). 


■  EIE  /|x,y,  ( -  8,l/|x  -  i„y  -  hy  (Jl 
“  •£!£  ^|x,y,  (l/|x  -  «„y  -  ( +  Jj] 

where  each  of  the  sums  is  over  all  (x,  y,  ()  e  ZA  I 

We  can  now  state  the  mam  result  of  this  section. 
Proposition  8 

For  any  random  stimulus  /,  the  following  conditions  are 
equivalenb 

I.  /is microbalanced. 

II.  For  any  space-time-separable  function  W,  WI  is  drift 
balanced. 
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'  Proof 

FirrtweprovethatcoDditiosIimpli'etccsditioDlL  ^ume 
'thai/issiicrobedAnced.  By  proposition  is  also'znicro* 
balanced;  it  thus  follows  proposiUdn^T  that  H7  »  mlcrobd' 
'  lanced  and  hence  drift  balanced  (from  3).^ 

Next  we  prove  that  hot  cmidition  I  implies  not  condition 
n.  Siq)po6ethat/ttnotmicrobalanced;th^forsomeCr,y,t)^ 
(i'./.tOeZ’,  "  ' 

m^.y*  t^]  m^,y^  t]h 

[Note  that  this  inequality  implies  that  (z,y)  fi'i;c',y)6ndt 
t'.]  Let/as^gn  1  to  (x»y)  and  (a',  /)*  and  let  it  assign  0  to 
all  other  points  ctZ^,  and  let;  assign  1  to  t  and  f  and  0  tdall 
other  points  of  JL  Then  the  function /;/  is  zero  mrywhere 

except  at  the  points  (x,y,t),  (x,y,  to.  (*'»>'.  tO.  and  (s',/,  to. 

It  is'  obvious,  ^m  proportion  4,  thH  fgl  is  not  drift  bal* 
anced.  Inp^'cular, 

The  results  stated  thus  far  in  this  section  would  not  be 
interesting  if  there  were  no  roicrobalanced  random  sUmuli 
that  disp]a>ed  consistent  apparent  motion.  The  following 
result  makes  it  clear  that,  in  fact,  all  the  examples  of  drift* 
balanced  random  stimuli  that  we  considered  previously  are 
microbalanced. 

Proposition  9 

Let  r  be  a  family  of  pairwise  independent,  microbalanced 
random  stimuli,  all  but  at  most  one  of  which  have  an  expec* 
tation  of  0,  then  any  linear  combination  of  F  is  microba¬ 
lanced. 


Proof 

Since  a  microbalanced  random  stimulus  multiplied  by  a 
constant  remains  microbalanced,  we  assume  that  the  linear 
combination  is  a  sum;  then,  for  any  (x,  y,  t),  (x',  /,  t')  e  Z^ 


Jer 


/fr,frr 

However,  whenever  /  ^  J, 

£|/lx.  y,  0  J|x',  y,  «')1  -  £(/|x,  y.  «llEMx'.  ‘D  -  0. 

Thus  Eq.  (9)  becomes 


^£[/ix.y.  «'])  “  ‘Viy.y-  o) 

ur  ter 


-£ 


ur 


J'/lx'.y'.fl  ■ 

Jtr 


Next  we  secure  the  analog  of  proposition  2. 


?r^K»ltic&10 

'n>e  (fpatiotem^ral)  convolution  of  two  independent  mi- 
crobalttced  random  stimuli  u  microbalanced. 

Proof 

It  b  convenient  to  write 


S 

for  a  sum  in  which  each  of  the  variables  a,  ranges  over  all 
integers.  For  any  independent  random  stimuli/ and  cf  and 
Cx,  y,  0.  (*'./»  fO  e  Z*. 

“Ej  ^Ilz-p.y-t.t-rJJlp.g.r] 

X  ^ iix'-p'.y-<}'.y-r^j\p',g',y]\ 
tut/  J 

-  ^  £[/|x-p,y-«.l-r)J|i'-p'y-(}',('-r11 
X  £IJ[p,  q,  r]J\p\  q\  r']]. 

But  if.  in  addition,  /  and  J  are  microbalanced,  then  this  last 
sum  b  equal  to 


^  E(;lx  -p,y-q,t'-  r'i;|i'  -  p'.  /  -  g',  t  -  r)) 
>(ElJ\p,g,r']J\p',g’,r]] 


■E  2^ilt-p.y-<i,‘'-t'lJlp,g,’'] 


Lw/ 

X  ^IW-p\y-<)',t- r]J\p', I)'. r) 

pW/  j 

•£(/*</(x,y,(']/*  J[x',y,(J].  I 


Response  of  Relchardi  Detectors  to  Microbalanced 
Random  Stimuli 

TwoFouricr-analytic  motion  detectors  proposed  for  psycho¬ 
physical  data^*  can  be  recast  as  variants  of  an  ERD.^  The 
ERD  has  many  useful  properties  as  a  motion  detector  with¬ 
out  regard  to  its  specific  inslantlation.*-*-** 

Figure  6  show's  a  diagram  of  the  ERD.  It  consists  of 
spatial  receptors  characterized  by  spatial  functions/i  and  /2, 
temporal  filters  gi*  and  gj*,  multipliers,  an  adder,  end  an¬ 
other  temporal  filter  h*.  The  spatial  receptors  /,  (i  ■  1, 2) 
acton  the  input  stimulus  I  to  pr^uce  intermediate  outputs, 


y.Kl-  X 

U.>)tZ’ 

At  the  next  stage,  each  temporal  filter  gj»  transforms  its 
input  y,  {ij  *1,2),  yielding  four  temporal  output  functions 
St  *  y.-  The  left  and  right  multipliers  then  compute 


a  Chubb  and  G.  Sperliof 


VolS.NalI/Kovemberl988/<I.Opt.Soc.Am.A  2001 


Fi2.  6.  Dufram  of  the  ERD.  Let  /  be  a  random  atimulus.  then,  in 
response  to/,  for  i « 1,2,  the  box  containing  the  apatial  function  f,‘V 
-•R  outputs  the  temporal  function  £(f^U2;i/,(z,>]/(r,y.t],  each  of 
the  boxes  marked  g,*  outputs  the  con\'o!ution  of  its  input  uith  the 
temporal  function  g,-2.  —  R:  each  of  the  boxes  marked  t^-ith  a  X 
outputs  the  product  of  its  inputs,  the  box  marked  anth  a  —  outputs 
Its  left  input  minus  its  risht.  and  the  box  containing  A*  outputs  the 
con\oiution  of  its  input  «ith  the  temporal  function  A  Z  ***  R, 


DIB]-  2) 

tjiPAfy 

X  B  -  u]I\p,g,  B-t]~  I{x,y,B  -  t)J[p,  q,B-  uH. 

-Hosier,  if  /  is  miaobalanced,  then  (by  deflniUon  6)  the 
expectation  of  the  ^tiare-bracketed  difference  is  0,  and 
hence  *  0  for  any  B  €  Z,  implying  the  following 

proposition. 

Proposition  11 

The  expected  response  of  any  elaborated  Relchardt  detector 
to  any  xnicrobalanced  random  stimulus  is  0  at  every  instant 
in  time. 

Miaobalanced  random  stimuli,  then,  compose  a  subclass 
of  drift'balanced  random  stimuli  with  special  importance 
for  the  investigation  of  non-Fourier  motion  perception.  In 
general,  the  fact  that  a  random  stimulus  /  is  drift  balanced 
does  not  entail  that  all  local  areas  of  /  be  drift  balanced;  that 
is,  the  window  over  which  the  Fourier  power  analysis  of  /  is 
carried  out  is  critical  to  the  drift-balancedness  of  /.  This 
constraint  is  escaped  by  microbalanced  random  stimuli  (as  a 
consequence  of  proposition  8).  a  random  stimulus  /  is  mi- 
crobatanced  iff.  for  any  space-time-separable  function  W, 
the  random  stimulus  WI  (the  result  of  windowing  /  by  IV)  is 
drift  balanced. 


Isi  -yiliHfe  -yj!')).  bi  -yjlOlkj  •yiWJ. 

respectively,  and  the  differencer  subtracts  the  output  of  the 
right  multiplier  from  that  of  the  left  multiplier; 

Dl<l "  (si  •  yiWIIsj  •  yji'll  -  Isi  •  yiWlls,  •  yil'll' 

The  final  output  is  produced  by  applying  the  filter  h*.  whose 
purpose  IS  to  appropriately  smooth  the  time-varying  differ¬ 
encer  output  D. 

In  the  following  discussion,  we  wTite 

S 

A, 

for  a  sum  in  which  each  of  the  variables  o,  ranges  over  all 
integers.  Given  a  random  stimulus  I  as  the  input  to  the 
ERD,  the  output  of  the  differencing  component  at  time  B  is 


8.  RECOVERY  OF  MOTION  FROM 
MICROBALANCED  RANDOM  STIMUU 

Nonlinear  Traniformallons  H>’pothes!$ 

The  most  plausible  explanation  for  the  recovery  of  motion 
from  drift-balanced  random  stimuli  posits  one  or  more  non¬ 
linear  transformations  that  are  routinely  applied  to  the  visu¬ 
al  input  signal  to  generate  a  new  signal,  which  is  then  sub¬ 
jected  to  ordinary  frequency-domain  power  anal>‘sis. 

(insider,  for  instance,  random  stimuli  such  as  those  de¬ 
scribed  in  demonstrations  1, 2,  and  5  (Figs.  48, 4b,  and  4e), 
whose  motion  depends  on  spatiotemporal  modulation  of 
noise  contrasU  For  concreteness,  we  focus  on  /,  the  con¬ 
trast-reversing  bar  of  demonstration  1  (Fig.  4a).  The  appar¬ 
ent  motion  exhibited  by  /  might  result  from  a  power  analysis 
in  the  frequency  domain  of  a  rectified  version  of  the  original 
signal  for  example,  a  transformation  of  the  signal  I  such  as 
Ri,  Si,  Ti*,  or  Tr,  where 


fllBl  - 1 X  S  Ab-yl'b.  y.  B  -  “1 1 

L «  'O'  J 

. '  M  J 

.  “  P.9  J 

^  X^2l*!]£/ila:.yl/(x,y,B-(l  I* 

.  ^  zy  J 


which  can  be  rewritten  as 


(i)  Bilx,y.t)“U(r,y,rll  (full-wave  rectification), 
(»)  S4x.y,(] -ilz.y,(P 

(full-wave  power  rectification), 
(in)  T/'*(x,y,  t)  «  max{/(x,y,(],0) 

(positive  half-wave  rectification), 
(iv)  Tr{x,y,tj»minj/jx,y,t),0} 

(negative  half-wave  rectification). 

B;  and  5;  both  transform  /  into  a  rectangle  moving  in  a  series 
of  brief  steps  from  left  to  right,  while  Ti*  and  Tr  map  1  into 
a  similar  such  moving  rectangle,  which  randomly  disappears 
and  reappears  in  the  course  of  its  left-to-right  traversal. 
The  MFFC  principle  applied  to  any  of  these  transformations 
of  /  would  indicate  motion  to  the  right  (see  Fig  7a).  In  the 
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Fig  7  (^n$«quences  of  full-«ave  and  ha]f-wav«  rectiHcation.  a,  Spaca-time  repreaentation  of  a  traveling,  contrast-reversing  bar;  full  wave 
(fw)  rectified  representation,  and  positive  (hw^l  and  negative  (hw)  half  wave  rectified  representations,  showing  that  either  of  these 
rectifications  suffices  to  expose  the  motion  to  Fourier  motion  energ)'  analysis,  b.  Space-time  representation  of  a  traveling  contrast  reversal  of 
a  random  bar  pattern,  full  wav  e  (fw)  rectified  representation,  positive  (hw*^)  and  negativ-e  (hw~)  half-wav’e  rectified  representations,  showing 
that  none  of  these  rectifications  exposes  motion.  The  anab  sis  system  for  second  order  motion  itimuli  is  shown  in  the  bottom  row.  c,  the  signal 
IS  linearly  filtered  (the  impulse  response  of  an  appropriate  space-time-separable  linear  filter  n  shown),  d,  the  filtered  signal  is  full  wave 
rectified.ande,iti$sub;ected  to  motion  energy  analysis  (e  g  .byanERD).  Thisisasufficientsequenceofoperationstoexposethedirectional 
motion  in  all  the  demonstrations  of  this  paper. 

realm  ofspatlal  visual  perception,  rectification  transforma-  contrast-reversal  J  defined  m  demonstration  3  (Pig  4c). 

tions  were  proposed  by  various  authors  to  mediate  boundary  Full-wave  rectification  of «/  yields  a  constant  output.  Half¬ 
formation  and  texture  segregation.^*  ^  Logarithmic  inten-  wave  rectification  merely  yields  another  drift-balanced  ran- 

sity  compression  was  also  proposed,^  **  because  of  its  phys-  dom  stimulus  T/  ■  (J  + 1)/2  and  Tj~  ■  (1  -  J)/2.  These 

lological  plausibility,  although  it  is  less  effective  than  rectifi-  relationsare  illustrated  m  Fig  7b.  The  motion  ofJ  does  not 

cation,  emerge  directly  from  any  of  these  forms  of  rectification. 

Although  any  one  of  the  rectification  transformers  would  Forthe  traveling,  random  contrast-reversalJtdemonstra- 
expose  the  motion  information  buried  in  /  to  frequency-  tion  3,  Fig.  4c},  a  time-dependent  linear  operator  such  as 

domain  power  analysis,  the  same  is  not  true  of  the  traveling  temporal  differentiation  is  required  to  transform  it  into  a 
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signal  which  motion  infonnation  can  be  extracted  after 

rectlHcation.  (Indeed,  the  j:)artud'deriyative  of  J  with  re¬ 
spect  to  time  ia/.) 

Consider  the  space-time-sepwablc^bandp^  filtering 
that  is  usually  assumed  to  occur  in  Idw-level  visual  process¬ 
ing.  If  such  linear  filtering  were  applied  to  any  of  the  dem  - 
onstrations  considered  in  this  paper,  and  if  it  were  followed 
by  any  of  the  rectification  operations  considered  above,  it 
would  suffice  to  expose  the  motion  of  any  of  these  demon¬ 
strations  to  Fourier  power  analysis.  Figure  7  illustrates  the 
sequence  of  filtering,  rectifying,  and  motion^power  analysis. 
A  central  issue  concerning  drift-balanced  random  stimuli 
thus  emerges:'  given  the  (largely  unexplored)  range  of  drift- 
balanced  random  stimuli  that  elicit  apparent  motion,  what 
is  the  simplest  array  of  transformations  of  the  input  signal 
that  suffices  to  expose  (to  frequency-domain  power  analysis) 
the  motion  information  canied  by  all  the  various  types  of 
drift-balanced  random  stimuli? 

What  is  the  Purpose  of  Having  Detectors  for  Drift- 
Balanced  Motion? 

From  a  systems  point  of  Nnew,  there  is  a  problem  in  linearly 
combining  the  information  from  many  linear  sensors  (for 
example,  motion-sensitive  sensors)  because  there  is  nothing 
gained  by  the  combination  that  could  not  have  been  accom¬ 
plished  by  a  single,  large  sensor.  For  an  advantage  to  be 
gamed  from  the  combination,  this  information  must  be  non- 
linearly  related  to  the  input  Nonlinearly  computed  quanti¬ 
ties  such  as  power  and  information  are  combined  most  use¬ 
fully  In  many  classical  detection  problems  the  ideal  detec¬ 
tor  is  a  povier  detector;  that  is,  the  power  of  the  component 
elements  is  summed  to  form  the  decision  variable.^*^ 
When  it  comes  to  detecting  motion,  it  would  be  surprising  if 
generally  similar  considerations  did  not  apply  in  combining 
Information  from  various  locations  of  the  visual  field  and 
from  detectors  of  various  sites.  Indeed,  the  MFFC  theories 
normally  use  motion  detectors  that  compute  Fourier  pow¬ 
er.**^ 

Assuming  that  evolution  chooses  detection  modes  because 
of  their  advantages,  what  is  surprising  about  the  detection  of 
drift-balanced  motion  is  that  the  advantages  of  nonlinear 
combination  are  already  available  at  the  earliest  stages  of 
sensory  analysis.  Ultimately,  to  appreciate  why  this  is  so 
requires  ecological  analysis  of  the  visual  world.  Obviously, 
the  ecological  problem  cannot  be  resolved  by  armchair  spec¬ 
ulation  On  the  other  hand,  given  that  combination  mecha¬ 
nisms  operate  with  rectified  inputs,  it  is  not  surprising  that 
the  mechanisms  that  detect  drift-balanced  motion  seem  to 
be  of  a  much  la;^er  scale  than  the  Fourier  mechanisms.^  A 
possibly  related  observation  is  that  the  apparent  motion  in 
various  drift-balanced  random  stimuli  that  we  have  consid¬ 
ered  here  tends  to  diminish  with  the  retinal  eccentricity  of 
the  presentation  ”  However,  it  remains  to  be  determined 
how  much  of  this  drop-off  of  apparent  motion  should  be 
attributed  to  the  effective  decrease  in  visual  spatial  sam¬ 
pling  rate  with  retinal  eccentricity. 

10.  UTiirry  of  random  stimuu  as  a 

RESEARCH  TOOL 

A  general  advantage  of  random  stimuli  compared  with  re¬ 
peated  stimuli  is  that  the  responses  to  a  repeated  stimulus 
might  be  mediated  by  any  of  lu  features,  including  artifac- 


tual  stimulus  features  that  are  not  anticipated  by  experi¬ 
menter.  Responses  to  random  stimuli  represent  the  re¬ 
sponses  to  the  properties  Uiat  distinguish  a  class  of  stimuli, 
snd  these  tend  to  be  more  general  and  more  readily  specifi¬ 
able  than  the  properties  of  a  single  stimulus.  Thus,  by 
generalizing  the  notion  of  a  siimul^  to  that  of  a  random 
s^uli^  we  obtain  a  much  more  extensive  and  adaptable 
.ret  of  t^ls  for  studying  perception. 

In  the  study  of  motion  perception,  microbalanced  random 
stimuli  play  a  crucial  role:  ^ey  avoid  the  complications 
intrbdured  by  the  spatial  windowing  that  is  imavoidably 
performed  by  motion-perception  units.  Avoiding  the  possi¬ 
ble  artifacts  of  windowing  is  particularly  important  in  inter¬ 
preting  the  responses  of  single  visual  neurons.  Only  a  mi¬ 
crobalanced  random  stimulus  is  guvanteed  to  contain  no 
consistent  Fourier  components,  regardless  of  how  that  stim¬ 
ulus  may  be  centered  or  fail  to  be  centered  in  a  given  neur¬ 
on’s  rereptive  field  or  in  the  observer’s  field  of  view.  It  is 
possible  for  drift-balanced  (but  not  microbalanced)  random 
stimuli  to  produce  systematic  Fourier  motion  components  in 
receptive  fields  of  particular  neurons  that  happen  to  be 
placed  advantageously  with  respect  to  those  stimuli.  Only 
microbalanced  random  stimuli  necessarily  require  non-Fou¬ 
rier  operations  In  order  to  yield  motion  perception. 

An  invariant  stimulus  is  microbalanced  (thereby  avoiding 
the  windowing  problem)  only  if  it  is  space-time  separable 
(proposition  6).  Unfortunately,  there  are  no  examples  of 
space-time-separable  stimuli  that  yield  a  strong,  consistent 
perception  of  motion.  Thus  random  microbalanced  stimuli 
that  yield  strong  perceived  motion  offer  a  unique  tool  for  the 
investigation  of  non-Fourier  motion  perception. 


11.  NON-FOURIER  STIMULUS  ANALYSIS  IN 
OTHER  SEP«  RY  DOMAINS 

Spatial  Vision 

One-dimensional  motion  stimuli  in  (x,  t)  can  be  represented 
as  two-dimensional  stimuli  in  (x,y).  From  the  point  of  view 
of  systems  analysis,  the  (x,  0  and  (x,  y)  representations  are 
equivalent,  motion  in  (x,  ()  is  equivalent  to  orientation  in 
(x,y).  There  are  inevitably  some  physical  restrictions  that 
apply  in  the  time  domain,^  so  that  x  and  t  cannot  be  so 
symmetrical  with  respect  to  each  other  as  x  and  y.  For 
example,  in  human  motion  detectors,  summation  over  time 
(of  comparator  output)  occurs  within  a  single  detector,  sum¬ 
mation  over  space  occurs  between  detectors. 

The  space-time  asymmetry  in  motion  can  be  made  obvi¬ 
ous  by  adding  two  gratings.  Thus,  when  a  drifting  sine- 
wave  grating  of  frequency  (u?;i,  u,)  is  added  to  a  stationary 
sine  pattern  of  frequency  (wx,  0)  (a  standing  grating),  the 
apparent  motion  is  normally  visible,  when  it  is  added  to  (0, 
W()  (a  uniform,  flickering  field),  the  apparent  motion  may 
either  be  normal  or  be  reversed,  depending  on  the  phase 
relations.*  In  the  space  domain,  both  combinations  are 
equivalent. 

The  fact  that  all  the  (x,  y)  spatial  illustrations  in  the 
figures  of  (x,  t)  motions  were  visible  as  oriented  textures 
demonstrates  that  the  same  or  similar  nonlinear  dynamics 
are  involved  in  the  extraction  of  orientation  as  are  involved 
in  the  extraction  of  direction  of  motion.  Indeed,  v,  e  have  yet 
todiscorer  an  (x,  ()  stimulus  that  is  perceived  as  moving  and 
that  is  not  perceived  as  oriented  texture  in  an  (x,  y) 
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representation. '  Hiis  suggests  that  the  human  anay  of  pat< 
tem-analytic  detectors  is  at  least  as  rich  as  the  motion* 
analytic  array. 

Audition 

Obviously,  a  one*dimensioDal  signal,  such  as  an  auditory 
signal  (which  depend  only  on  time),  cannot  be  drift'bal* 
an'ced.'  Nonetheless,  certain  auditory  phenomena  .!:^  a 
r^emblahce  to  some  of  the  visual  effkts  that  we  Imye  been 
considering. 

It  h’^  long  been  rec(^ized  that  the  auditory  systoin  ana* 
iyzes  sound'pressure  waveforms  into  their  cbm^nent  sinus* 
oidal  frequencies  and  that  these  frequency  components  cor* 
fesporid,  at  least  to  a  Hrst  approximatioh,  to  the  sensation  of 
pitch.  Indeed,  the  cochlea  functions  largely  as  a  mechanical 
frequency  analyzer.  In  addition  to  pure  frequency  anal>^is, 
especially  at  periodicities  below  300  Hz,  another  mechanism, 
periodicity  analysis,  also  comes  into  play.  One  of  the  best 
demonstrations  is  an  experiment  by  Miller  and  Taylor.^ 

Some  background  facts  about  this  experiment  are  useful 
here.  A  broad*spectrum  noise  N  is  a  random  function  of 
time  such  that  the  expected  power  of  all  Fourier  components 
in  N 1$  equal.  It  is  easy  to  show  that  any  random  function  N 
that  assigns  pairwise  independent  random  variables,  all  with 
mean  0,  to  distinct  points  m  time  is  a  broad-spectrum  noise. 
Obviously,  multiplying  any  such  random  function  N  by  an 
arbitrary  nonrandom  function  /  yields  yet  another  broad- 
spectrum  noise,  since  the  values  assigned  by /N remain  pair¬ 
wise  independent,  each  with  mean  0. 

In  the  experiment  by  Miller  and  Taylor,  listeners  heard  a 
broad'Spectrum  noise  that  was  modulated  on  and  off  (multi* 
plied)  by  a  square  wave  of  frequency  /,  Thus  the  stimulus 
generated  by  Miller  and  Taylor  had  a  uniform  expected 
power  over  all  temporal  frequencies.  When  /  was  less  than 
--10  Hz,  the  perception  corresponded  to  the  physical  reality 
of  interrupted  noise.  At  frequencies  between  40  and  200  Hz, 
the  interrupted  noise  was  perceived  to  have  a  pitch  that 
corresponded  to  the  interruption  frequency.  That  observ¬ 
ers  perceive  a  pitch  implicates  some  mechanism  other  than 
frequency  analysis.  Whereas  a  rectifying  nonlinearity  was 
not  proposed  explicitly  by  Miller  and  Taylor,  U  is  the  obvi¬ 
ous  intermediate  step  in  periodicity  pitch  perception. 

12.  FINAL  REMARKS 

We  have  given  precise  definition  to  the  notion  of  a  random 
stimulus  and  focused  our  attention  on  the  subclasses  of 
drift-balanced  and  microbalanced  random  stimuli  as  being 
especially  interesting  for  the  study  of  visual  perception.  We 
first  shovsed  that  the  (spatiotemporal)  convolution  of  inde¬ 
pendent  drift-balanced  random  stimuli  is  drift  balanced. 

Proposition  3  (which  states  that  the  sum  of  dnft-balanced 
random  stimuli  is  drift  balanced  when  the  dements  are 
pairwise  independent  and  all  but  at  most  one  h  we  expecta¬ 
tion  0,  the  non-0  element  being  invariant)  and  proposition  9 
(which  states  a  similar  result  for  microbalanced  random 
stimuli)  provide  access  to  a  large  family  of  empirically  useful 
drift-bdanced  random  stimuli.  Instances  that  display 
striking  apparent  motion  may  be  constructed  readily. 

In  Section  8  introduced  microbalanced  random  stimu¬ 
li,  a  distinguished  subclass  of  dnft-balanced  random  stimuli 
defined  by  the  following  property.  A  random  stimulus  I  is 
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microbalanced  iff,  for  any  spai^time-separable  funcrion  W, 
the  product  WI  is’d^  balim^^  '^us  /  is  guaranteed  to 
avoid  ^tematically  stimulating  any  Fourier  power  morion 
mechanisms  encountering/through  any  space-time-separa- 
hle  window.  -  It  was  proved  that  (proposition  5)  any  space' 
rime-separable  random  stimulus  is  microbalanc^;  that 
(proposirion  6)  any  invariant  microbidanced  stimulus  is 
space-rime  separable;  that  (prd^iriori  7)  the  product  of 
two  independent  microbalanced  random  stimuli  is  micrqba* 
lanced;  that  (proposirion  9)  any  linear  combination  of  pair¬ 
wise  independent  miaobalanced  random  stimuli,  all  but  at 
most  one  of  which  has  expectation  0,  is  microbalanced;  and 
that  (proposition  10)  the  spatiotemporal  convolution  of  two 
independent  microbalanc^  random  srimuli  Is  microbal* 
anced.  An  implication  of  prop<»ition  9  is  that  all  the  dem* 
onstrarion  srimuli  presented  in  this  paper  are  not  only  drift 
balanced  but  also  microbalanced.  Finally  (in  proposition 
11),  we  showed  that  the  expected  response  of  any  elaborated 
Reicbardt  detector  to  any  microbalanced  random  stimulus  b 
0  at  any  instant  in  time. 

In  lighi  of  earlier  observations,^*^*  the  existence  of  non- 
Fourier  mechanisms  is  hardly  surprising.  Such  mechanisms 
have,  however,  received  no  thorough  investigation.  The 
range  of  types  of  such  mechanisms  has  not  yet  been  elaborat¬ 
ed,  and  their  psychophysical  properties  remain  largely  un¬ 
studied.  The  importance  of  proposition  3  and  the  results  of 
Section  8  lies  in  their  utility  for  constructing  stimuli  for 
probing  both  the  nature  of  non-Fourier  motion-detection 
mechanisms  as  well  as  the  interaction  between  such  mecha* 
nbms  and  the  band-tuned  motion  detectors  that  were  the 
focus  of  most  previous  research. 

APPENDIX  A 

In  this  appendix  we  verify  that  £[l7(w.  8,  r)l^)  exists  for  any 
random  stimulus  7  and  any  w,  9,  r  e  El  (which  was  presumed 
in  definition  2).  Let  D  ■  |(x,  y,  t)  e  Z^I/(x,  >,  (]  0),  then 

£ll7(<i. »,  r)l’l  - ^  i[i,  y.  tli|p.  V, '] 

X  eipl;(i.(x  -  p)  +  9(y  -  ?)  +  .(( -  r))l/(i)di 

"  S  Irw 

X  expl;(u(i  -  p)  +  8(y  -  «)  +  t((  -  r))|, 

where  each  sum  ranges  over  all  pairs  of  points,  (x,  y,  t),  (p,  q, 
r)  €  Z*.  Note  now  that 

f  i[x,y,  tjib.  9*  ^l/(0di  -  £|7(x,y.  tl/[p.  9,^)1- 

However,  as  a  consequence  of  the  (probabilistic  version  of 
the)  Schv  arts  inequality,”  we  note  that 

H|/|x.y.ti/b.9.r]l  5  (£[/(x.y,fnEi/[p,9.a)'''- 

However,  by  the  definition  of  a  random  stimulus,  the  two 
expectations  on  the  right-hand  side  of  the  inequality  exist 
Hence  E(!7(w,  $,  r)l^]  exists  for  all  6,  r  e  IR. 
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APP^K  B 

,Ih  this  appendix  we  prove  lemma  1,  which  is  ^  foUo^  , 
Let  She  a  rahdom'stimulus  eQual  to^thesum  of  a  set  Q  of 
palnnse  independent  random  'stimuli;  then 

UO' 

where  /  -  £/  for  each  7  6  ft. 

First  we  write' 

s-'2^ie,+n,). 

/fO> 

The  linearity  of  Fourier  transformation  then  yields 

S-^(£,  +  j9,). 

/cO 

Thus 

ISl*  -  +  ftjifTjy  +  +  ftiiEjH 

where  the  sum  is  over  all  /,  J  €  ft. 

Note  first,  however,  that,  whenver  I  J, 

£l£,(£j)*  +  -  F.,(Ej)‘, 

since  /  end  <7 are  independent  and 

Moreover,  whenever  7  •  d, 

£(£,(£,)•  +  +  £;(7?,)'  +  i9, (£,)•! 

-£,(£,)• +  £((9, (A?, )•). 

Thus 

JtQhfQ  /«0 

-  j£/+^£|W,l»i 

JtO  ItQ 

-i£/+2;EiWrn.  I 


APPENDIX  C 

In  this  appendix  we  prove  that  the  random  stimuli  G  and  77 
of  demonstrations  5  and  4  axe  drift  balanced  These  ran* 
dom  stimuli  stem  from  proposition  3.  To  malce  the  bridge 
explicit,  we  shall  need  to  derive  a  corollary  (Cl)  that  depends 
on  the  following  lemma. 

Lemma  Cl 

For  M  t  Z*,  let  the  random  variables  po,  pi,  pjy.j  be 
pairwise  independent,  each  uniformly  distributed  on  [-», 
x),  then,  for  any  x,y,t  €  Z,  define  the  random  stimulus  /  by 
setting 


*  M-i 

Tli.y.  <1  “  2!  -  sinCpJhJrl). 

m-0- 

where,  in  each  c^.  dm,  hn,  and  km  are  all  real-v^ued  func* 
tions  that  equal  zero  at  all  but  a  finite  numl^r  of  points  of 
their  respective  domains.  I  is  then  drift  balanced. 

PropL 

For*^ n  ■*.0, 1, . . . ,  A7  - 1,  term  m  of  /  is  space-^time  separable 
^ahd  hence  drift  balanced.  Moreover,  for  eacK  m,  the  expec* 
tations  of  sin(pm)  and  cos(pm)'are  bc'h  0.  Thus  the  expecta* 
tion  of  each^term  of  the  sum  yielding  /  is  the  resdt  follows 
from  prop^ition  3.  | 

We  apply  lemma  Cl  to  prove  the  following  corollary  used 
m  co^tructing  stimuli  for  demonstrations  4  and  5. 

Corollary  Cl 

For  M,N€  let  po,  . . .  be  pairwise  independent 

random  variables,  each  uniformly  distributed  on  (-x,  x); 
then,  for  any  z,  y,  (  £  Z,  define  the  random  stimulus  I  by 
setting 

w-i  y-i 

where,  for  m  «  0, 1 . M  - 1,  and  n  »  0, 1, . . N  - 1,  the 

functions  dm,  Pm^,  and  qm,n  are  real  valued  and  zero  at  all  but 
a  finite  number  of  points  of  their  corresponding  domains.  / 
is  then  drift  balanced. 

Proof 

We  recast  7  so  as  to  apply  lemma  Cl; 

»«»0  n»l> 

X  (CPS(?™^|I))C0S((>„)  -  sin((j„^(())sin(pJ) 
M-l 

for 

n-\ 

A«l'l  ■  ^  P»;,(<|i:os(?„,(tl), 

A-0 

S-l 

*,W  •  2]  P«/.(‘ls'n(««^l<l)-  I 

n-0 

Proof  Thai  77  (Demonstration  4)  Is  Drift  Balanced 
77  contains  N  frame  blocks  indexed  0, 1, . . . ,  N  1,  each 

composed  of  Af  rectangles  indexed  0, 1 . A7  ~  1  from  left 

to  righU  Let  po.  Pi . Psr-i  be  pairwise  independent  ran* 

dom  variables,  each  uniformly  distributed  on  (-x,  x).  Let  C 
be  a  contrast  value.  We  can  express  77  as  follows.  Form* 
0, 1, ....  A7  - 1,  let  <7n(x,yl  *  1  for  (x,  y)  in  the  mth  rectangle 
and  0  elsewhere,  and  for  n  ■  0, 1, . . . ,  iV  -  1,  let  g,lO  ■  1  in 
the  nth  frame  block  and  0  elsewhere;  then 
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m«>0  A*0 

e)))  > 

To  check  that  His  dri^hdanc^,  make  the  following  identi* 
flcations,  sjid  then'apply  corollary  Cl: 

P™,(<1  -  C«,lt) 

and 

Thus  corollary  Cl  applies,  and  we  conclude  that  H  is  drift 
balanced.  (Note  that  H  does  not  exploit  the  full  generality 
of  corollary  Cl,  since,  for  these  identifications,  Pm.n[()  docs 
not  depend  on  m  and  9n.n[f  ]  <1  not  depend  on  t.) 

Proof  That  CfDemonstratlon  5)  Is  Drift  Balanced 
The  random  stimulus  G  is  made  up  of  H  frame  blocks  in¬ 
dexed  0, 1 . N  -  1.  each  containing  M  rectangles  indexed 

0, 1, , . . ,  Af  1  from  left  to  righu  Let  po<  /’ll  •  •  -  •  fiM-\  be 
pairwise  independent  random  variables,  each  uniformly  dis¬ 
tributed  on  (-»,»).  Let  C  be  some  contrast  value.  We  can 

then  express  Gas  follows.  Form  *0,1 . Af-  l.l€td«(x, 

>j  ■  I  for  (x,  y)  in  the  mth  rectangle  and  0  elsewhere;  for  n  ■ 

0, 1 . N  - 1,  letg«(rl  ■  1  for  t  in  the  nth  frame  block  and  0 

elsewhere;  then 

N-l 

Clu.y,  <1  “ 

KK“  E  9)  +'’")■ 

To  see  that  G  is  drift  balanced,  set 

and 

and  apply  corollary  Cl.  (Note  that,  as  with  H,,  G,  does  not 
exploit  the  full  generality  of  corollary  Cl,  since  de¬ 
pends  on  neither  m  nor  t.) 
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Djaaaic  tsa«{es  cf  iadKidtul  of  Aaerioa  (ASL)  «ith  o  rtsofetSoo  of  96  >C.€I  pcxcb  were 

bypass  nittred  in  tdjtctzi  bands.  IsttSipbSity  vas  dettfshed  ^  tester  daaf  subjects  Cuest  in 

ASL  The follovinsrt^tsirerc Stained.  (DByitenti^yvaryi^theorsSerfreqdeanetiadbasdiridthseftbe 
^tia!  bandpass  filters,  it  «as  possible  to  divide  tbe  oripnal  sigu)  into  four  difTertnt  cospooen:  bands  of  !^h 
intellisibihiy.  (2)TheBeasu7^te=poral'frcqneacyspectro9vasapproxi=a!ebrtbesa»einailbands.  (2)T^ 
r&asklogofsigubinbandibvaotseiobaadi»-aifousdtobeie%erselypropcctioaaltekft/.^.ay^.-i.  Atcoestast 
performanre.  the  ratio  of  root^isean-square  s^x^al  asplHode  to  xxsse  asplitude.  s/a.  vas  the  sase  for  bands  2.3. 
and  4  and  higher  for  band  1.  (4)  Wbeaveabs^caUtasdivcteaddedliniearb'.thefevasasl^ttauQifibtiityad' 
\*nutt  for  SAtaaSs  in  the  sase  band  (i »  j)  cospared  vith^a^  in  adjacent  bands  and  for's^sals  b  adjacent 
bands  cospared  aith  sitnals  in  distant  ban^ 


INTRODUCTION 

Much  has  been  learned  about  how  the  spatial-frequency 
components  of  simple  visual  stimuli,  in  combination,  con¬ 
tribute  to  visual  responses.  Most  of  what  we  know  is  con¬ 
cerned  w  ith  simple  stimuli  near  their  threshold.*  For  exam¬ 
ple,  there  IS  ample  evidence  that  multiple  channels  (mecha¬ 
nisms)  are  involved  in  the  detection  of  simple  visual 
stimult»<iifrerent  channels  at  different  retinal  spatial  fre- 
quencies.2  It  is  believed  that,  at  threshold,  these  channels 
sum  their  information  probabilistically.  Whether  a  channel 
that  subserves  one  spatial  frequency  inhibits  channels  that 
.fubserve  other  frequencies  is  unclear;  different  results  are 
reported  for  different  procedures.* 

Much  of  the  visual  research  is  concerned  with  spatial 
frequencies  as  they  are  produced  ai  the  retina.  The  dis- 
cnminabibty  of  stimuli  that  are  well  above  threshold,  and 
explicitly  limited  by  external  noise,  is  independent  of  view¬ 
ing  distance  (retinal  angle)  over  a  wide  range.*-*  Noisy  sig¬ 
nals  are  discriminated  equally  at  vastly  different  retinal 
frequencies,  and  their  perceptual  properties  are  best  charac¬ 
terized  by  cycles  per  object  rather  than  cycles  per  degree  of 
visual  angle. 

In  a  visual  communication  channel  for  complex,  dynamic 
visual  stimuli,  such  as  American  Sign  Language  (ASL),  the 
(imitations  are  related  to  stimulus  noise  and  to  stimulus 
subsampling  rather  than  to  low  contrast,  that  is,  the  intelli¬ 
gibility  of  these  ASL  stimuli  is  limited  by  external  distor¬ 
tions,  modeled  as  noise,  rather  than  by  internal  noise  Such 
limitations  will  probably  be  characterized  by  object  spatial 
frequencies,*  and  almost  none  of  the  previous  literature  on 
spatial-frequency  interactions  in  vision  is  directly  applica¬ 
ble  Therefore,  to  design  optimal  communication  channels 
for  transmitting  dynamic  complex  stimuli,  there  is  no  alter¬ 
native  to  studying  them  directly. 

From  a  practical  point  of  view,  visual  communication 
channels  would  be  immediately  useful  to  the  several  hun¬ 
dred  thousand  hearing-impaired  individuals  who  rely  on 


ASL  for  coitmimication.*  More  than  two  million  Ameri¬ 
cans  are  unable  to  understand  speech  even  with  a  hearing 
aid;  many  of  these  would  benefit  having  a  visual  commu¬ 

nication  channel  to  aid  their  utilization  of  residual  hearing. 
The  problem  is  that  available,  affordable  channel  capacity  is 
limited,  and  compressing  images  to  utilize  this  opacity  effi¬ 
ciently  requires  a  better  understanding  of  how  frequency 
components  of  complex  images  contribute  to  their  intelligi¬ 
bility  as  well  as  better  methods  of  image  compression.**'*' 
Thb  study  is  concerned  with  how  the  visual  information  in 
component  spatial-frequeno*  bands  of  a  complex  visual  sig¬ 
nal.  ASL.  combines  to  facilitate  or  to  interfere  with  the 
intelligibility  of  ASL  Therefore  first  we  attempt  to  estab¬ 
lish  four  spatial-frequency  bands  having  approximately 
equal  intelligibility  for  ASL  Second,  we  measure  the  tem¬ 
poral  characteristics  of  each  of  these  bands.  Third,  we 
study  how  various  intensities  of  noise  in  frequeno*  band  i 
interfere  with  signals  in  band  j.  Fourth,  we  determine  how 
weak  signals  in  band  i  combine  with  weak  signals  in  band  j  to 
fanlitate  perception. 

EXPERIMENT' 1:  BANDS  OF  EQUAL 
INTELLIGIBILITY 

The  purpose  of  experiment  1  is  to  deriv  e  a  number  of  spatial- 
frequency  filters  to  produce  bandpass  ASL  stimuli  from  the 
original  ASLstimuli.  Each  band  should  have  approximate¬ 
ly  equal,  and  moderately  high,  intelligibility  Preliminary 
work  suggested  that  four  such  bands  would  be  possible  for 
our  stimuli. 

Method 

Originol  Stimuli 

The  stimuli  consisted  of  isolated  ASL  signs  displayed  at  30 
frames  per  second  (fps)  on  a  television  raster  monitor 
Signs  took  2-3  sec  and  consisted  of  60-90  frames  A  stan- 
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Fi^  3.  Ist«U>;>b3it>  (ptictnUfje  of  correct  ASL  identiHca* 
iionsi  as  a  funaion  of  tite  spatui]l-frequenc>  band.  Cur\*e  labeled 
INfltAL  aas  vUa.ned  a  ezpenseot  la  Kith  the  filler  set  at  top  of 
Fl(  1.  cur\e  bbeled  FINAL  «as  obtairied  in  experiraent  lb  with 
filters  at  the  bott^  ^  H;.  1  asd  with  iaproved  stiaull 


procedure  by  a  proficient  signer.  The  signs  were  run  in 
blocks  (by  frequency'  band)  so  that  the  signer  would  be 
maximally  prepared  for  the  type  of  stimulus  to  be  shown  on 
atrial 

Results 

The  average  percentages  of  correct  responses  in  each  band 
are  shown  in  Tabic  1.  As  can  be  seen,  performance  improves 
ivith  increasing  frequency*,  from  3S^  in  band  I  to  SO^  in 
band  4. 

Experiment  lb:  Filter  Set  3 
Procedure 

Filter  set  1  did  not  generate  equally  intelligible  bands. 
Thereforethe  niters  wete  changed  according  to  an  algorithm 
that  estimated  the  contribution  to  intelligibility  of  every 
component  frequency  and  attempted  to  distribute  these 
contributions  equally  among  the  l^nds.  In  addition  to  in¬ 
telligibility  differences  among  bands  in  experiment  la.  we 
noted  that  there  were  some  unfamiliar  signs  and  that  these 
may  not  have  been  distributed  eriually  among  groups. 
Therefore,  for  subsequent  tests,  28  ambiguous  signs  were 
discarded  The  remaining  72  signs  were  divided  into  four 
groups  and  Aere  tested  as  before.  Subsequently,  the  filters 
were  again  adjusted  by  an  algorithm  to  increase  the  band¬ 
width  ofthe  bands  with  the  worst  performance  and  todimin- 
ish  the  bandwidth  of  the  bands  with  the  best  performance 
The  final  filters  are  shown  in  Fig  1.  and  examples  of  the 
filtered  stimuli  are  illustrated  ;n  Fig.  2 
To  make  the  inieiligibility  test  more  accurate,  data  col¬ 
lected  up  to  this  point  were  used  to  rank  the  signs  into  three 
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extetesitsz  ea^.s:edhim,  and  difficult,  ^gnstncadicate' 
goey  were  distributed  evenly  into  band  condhioas.  Further, 
a  balanced  Latio  square  b^k  des^  was  used  $0  that  eadi 
egg  vas  processed  in  tash  frequecQr  band;  ie.,  four  com- 
;^e:e  stimulus  video  tapes  were  prepared,  cads  of  whidi 
eontaioed  aD  the  experimental  ASL  signs  but  distnbuted 
into  diRennt  filter  groups.  Eight  subjects  were  run,  two 
subjects  for  eads  cell  of  the  Latin  square. 

Results 

Filter  set  3  yielded  four  bands  with  intelU^nities  that  were 
more  xrearly  equal  than  those  of  filter  set  1,  but  intelligibility 
was  still  not  completely  uniform  across  bands.  loteUi^il- 
ity  ranged  from  66^  in  band  1  to  87%  in  band  3  (Fig.  3). 
Although  the  four  hands  of  filter  set  3  were  cot  equally 
intdii^le,  they  were  suffidently  dose  to  equal  that  we 
could  move  forward  with  the  main  experiments  to  investi¬ 
gate  how*  signals  in  di^erent  bands  interfere  with  and  facili¬ 
tate  one  another. 

EXPERLMENT2;  THETEMPORAL-FREQUENCY 
SPECTRUM 

Here  we  address  the  question:  What  is  the  temporal  power 
spectrum  of  the  signal  in  each  of  the  spatial  bands  derived  in 
experiment  1?  This  question  is  of  interest  in  its  own  right  in 
terms  of  discovering  the  correlation  of  spatial  and  temporal 
frequenciesm  the  environmentand  therefore  m  defining  the 
optima!  visual  detectors  for  operating  m  this  env'ironmenU 
More  immediately,  we  will  need  the  temporal  data  in  experi- 
ment3  to  create  dyiiamic  visual  noise  that  is  matched  to  the 
spatially  band-limited  .ASL  signals  in  both  spatial  and  tem¬ 
poral  frequency*. 

To  determine  the  signal  power  as  a  function  of  temporal 
frequenQ*.  eight  representativ  e  ASL  signs  w  ere  selected.  At 
the  mean  spatial  frequency  m,  of  each  spatial-filter  i  of 
experiment  I  (see  Table  1.  column  6),  a  small  spatial-fre¬ 
quency  range  Am,  (Am,  =  l(c.?,,w^)Im*  -  r  £  w/  <  m*  + 
<|)  was  selected  for  analysis.  This  is  the  range  of  spatial 
frequencies  that  best  characterizes  its  spatial-frequency* 
band. 


TEMPO'^AL  FREQUENCY  Hj  (LOG  SCALE) 


Fig  4  The  tempore!  po^er  spectrum  of  ASL  in  spattal  frequency 
bands  1-4  The  abci&sa  represents  the  temporal  frequency  in  hertz, 
the  maximum  frequency  of  15  Hzisdeterm’ned  bv  the  frame  rate  of 
.TO  Hz.  The  ordinate  represents  »he  average  po«-er  in  an  annular 
band  of  temporal  frequencies  extracted  from  a  three  dimensional 
U,  v.tlFmrier  analyst*  of  eight  represenlaltve  ASL  sign  sequences 
The  line  of  slope  -1  is  drawn  for  reference 


R<  RiedJ  sad  G. 


VciL5.Xa4/Apn1 193S/J.Opt.Soc.AsLA  609 


The  spatia)  ra:^e  ±in,  is  an  annulus  in  spatial- 
frequency  space  and  a  hollow  o'Under  in  (tf,,  <^)  spatio- 
tesporaj-frequency^iace.  For  every  small  range  of  tempo¬ 
ral  frequencies  d/  within  Am,  the  average  power  (over  the 
eight  signs)  was  computed  at  each  spatiotemporal  frequency 
(annular  ctoss  section  of  the  cylinder).  Tl*e  whole  computa¬ 
tion  was  repeated  for  each  of  four  spatial  bands  i.  These 
data  (temporal  pow  er  versus  temporal  frequent’,  for  each  of 
the  four  spatial  frequencies  m,)  are  displayed  in  Fig.  A. 

0>’eral]  temporal  power  diminishes  with  increaiUDg  spatial 
frequeno*.  Within  each  spatial-frequency  band,  temporal 
power  falls  off  with  an  initial  slope  of  approximately  on 
the  graph  of  logio  (power)  versus  logto  (frequency),  lo’eling 
offathightemporalfrequencies.  Theapproxtmateparallel- 
ism  of  the  temporal-frequency  power  cur\*es  (for  different 
spatial  frequendes)  su^ests  that  the  temporal-frequeno* 
composition  of  our  ASL  stimuli  is  independent  of  their  spa¬ 
tial  composition. 

EXPERIMENT  3:  CROSS-BAND  MASKING  BY 
NOISE 

Typical^. cross-band  masking  has  been  studied  with  simple 
static  signals^-^^^  rather  than  with  realistic  d>*namic  stim¬ 
uli.  The  purpose  of  experiment  3  is  to  determine  the  extent 
to  which  d>*namic  noise  in  spatial-frequency  band  j  inter¬ 
feres  with  dyTtamic  ASL  signals  in  band  i.  for  all  16  combina¬ 
tions  of  i,;.  s  1, 2. 3, 4.  Basically,  this  requires  determining 
the  performance  versus  the  signal-t<^noise  ratio  in  each  of 
the  16  different  band  combinations.  Because  at  least  half  a 
dozen  values  of  s!n  must  be  sampled  to  determine  a  perfor¬ 
mance  function,  this  experiment  requires  determination  of 
the  performance  in  almost  100  conditions.  Since  it  is  im¬ 
practical  to  create  and  maintain  a  stimulus  set  of  ASL  signs 
large  enough  for  this  immense  task,  a  rating  procedure  was 
used  instead  that  involved  intelligibility  judgments  of  only 
(wo  representative  ASL  signs. 

Method 

Stimuli 

The  signals  were  the  recorded  ASL  signs  "home”  and 
“flower”  from  the  previous!)  described  set.  The)  were  fil¬ 
tered  in  each  of  the  four  bands  determined  b)  filter  set  3  of 
experiment  1  (Fig  1).  Togeneratenoisestimuh, westarted 
with  white  Gaussian  noise  in  ix,  y,  t).  In  the  frequenc) 
domain,  the  noise  power  spectrum  was  shaped,  separate!)  in 
each  of  the  four  bands,  to  conform  to  the  three-dimensional 
(2>>>  power  spectrum  of  the  signals,  that  is,  within  each 
spatial  frequenc)  band,  the  temporal  shape  of  the  noise 
power  spectrum  was  matched  to  the  shape  of  the  signal 
temporal  spectrum  as  determined  in  experiment  2. 

Signol  Power  in  a  Frame 

The  signal  power  in  a  frame  is  defined  as  the  variance  of  the 
signal  luminance  over  the  pixels  of  that  frame.  The  signal 
power  Og^  is  the  average  power  of  the  frames  in  a  sequence 
iln  fact,  the  power  variation  between  frames  is  small  >  The 
noise  power  is  computed  similarly. 

SignoI-to-NoJse  Rolio 

The  signal-to  noise  ratio  s/n  js  Note  that  here  the 

signal-to-noise  ratio  is  defined  in  termt  of  standard  devi¬ 


ations,  the  root-mean-square  (rxns)  amplitudes  of  the  signal 
and  the  noise.  These  are  the  square  roots  of  the  powers  of 
the  s^nal  and  the  noise.  A  set  of  stimuli  illustrating  the 
nc^.  the  signals,  and  thnr  combinatio.is  is  shown  in  Fig.  5. 

Procedure 

The  display  viewed  by  the  subject  consisted  of  twoadjacent 
sequences.'  On  the  left-hand  side  was  a  noiseless  sign  in 
band  i,  and  on  the  right-band  side  the  same  ASL  sign  filtered 
in  the  same  band  x  was  combined  with  added  noise  from 
band  ^176  such  pairs  W'ere  presented  to  the  subjects.  The 
combinations  of  i,j,  sin,  and  the  ASL  sign  occuned  in  ran¬ 
dom  order. 

Rating  Scale 

Subjects  viewed  the  noisy  and  noiseless  sequences  side  by 
side  and  were  asked  to  rate  the  noisy  one  on  the  following 
rating  scale: 

0,  Casnotdetect  sign  at  all; 

1,  Barely  visible  signer,  but  cannot  see  sign; 

2,  Visible  signer,  some  trace  of  sign; 

3,  (^  guess  at  sign,  but  most  features  indiscnmmable, 

4,  Fairly  discriminable  sign,  but  some  critical  features 
missing; 

5,  Visible  sign,  but  poor-quality  image, 

6,  Highly  discriminable  sign  with  go^-qualit)  image. 

Subjects  used  fractional  ratings  to  describe  their  judgments 
more  predsel).  The  noiseless  sequences  served  as  refer¬ 
ences  to  help  the  subjects  anchor  their  responses.  Ratings 
were  collected  from  three  subjects  Subsequent!),  the  sin 
values  were  adjusted  to  obtain  a  better  sample  of  the  rating 
function,  and  three  more  subject  s  w  ere  run.  In  this  expert  - 
ment  alone,  the  subjects  were  hearing  nonsigners. 

Results 

The  stimulus  range  was  quite  large,  from  stimuli  in  which 
the  subtle  details  of  an  ASL  sign  were  perfectly  visible  to 
stimuli  in  which  even  the  presence  of  the  signer  was  com¬ 
pletely  masked  by  noise.  Thus  the  range  of  ratings,  for  any 
psrticular  stimulus  condition,  was  rather  small  \\  ithin  this 
range,  it  was  most  practical  simpl)  to  treat  the  ratings  nu- 
mericall)  and  to  obtain  the  average  rating  across  subjects 
Inapreviousstud>,*^qualit)  ratings  were  obtained  for  a  large 
set  of  stimuli,  a  subset  of  which  was  then  careful!)  tested  b) 
formal  intelligibility  tests  The  correlation  between  rated 
quality  and  objective!)  measured  intelligibility  was  0  85 
Considering  that  the  mtelligibility-tested  stimuli  were  a  ho¬ 
mogeneous  subset  of  the  most-intelligible  stimuli,  the  high 
correlation  was,  in  tne  authors’  words,  "an  impressive  vindi¬ 
cation  of  the  rating  procedure”  (Ref.  9,  p.  364). 

Figure  6  shows  an  example  of  1  of  the  16  rating- versus-s/n 
functions  for  stimulus  band  3  with  noise  band  3.  The  data 
(mean  rating  R  versus  log  s/n)  were  fitted  by  three-segment 
linear  functions  (a  total  of  three  parameters)  constrained  as 
follows  (s  and  n  are  shown  as  S  and  N  in  all  the  flgures  ) 

In  segment  l.the  left  hand  asymptote  was  constrained  to 
be  horizontal  at  R  »  0.  In  segment  3,  the  right-hand  asymp¬ 
tote  as  s/n  -*  »  was  horizontal  at  /?  «  R,  Segment  2 
connected  segments  1  and  3  The  square  deviation  of  the 
data  from  the  three-segment  fit  was  minimized  b)  an  opti¬ 
mization  program  Figure  6  illuotrates  the  parameter- 
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Fig  5  Examples  of  all  combinations  of  band  HUered  signals  plus  band  filtered  noise  a,  Ga'issian  noise  filtered  in  bands  1-4  f  left  to  ngbt) 
b,  Band-filteicd  ASL  signals  plus  band  filtered  noise  Each  row  represents  a  single  signal  band  with  band  1  at  the  lop  and  band  4  on  the  bot¬ 
tom.  Each  column  (continuing  downward  from  a)  represents  a  single  band  of  Gaussian  noise  The  leftmost  column  represents  the  noise  free 
Signal. 


estimation  procedure.  The  single  masking  effectiveness  pa¬ 
rameter  ($/n)so%  used  to  describe  each  rating  function  is  the 
s/n  ratio  at  which  the  function  attains  0  5  times  its  asymp¬ 
totic  height/?®. 

Figure  7  shows  the  set  of  16  estimated  rating  functions 
that  describe  the  masking  of  each  ASL  band  by  each  of  the 
noise  bands.  The  {s,/n,)un  values  derived  from  the  rating 
functions  of  Fig.  7  arc  graphically  displa>ed  in  Fig.  8,  which 
summarizes  the  cross-band-masking  data.  Bands  1,2,  and  4 


mask  themselves  better  than  they  mask  any  other  band 
Band  3  appears  to  mask  band  4  slightly  more  than  it  masks 
itself,  but  we  do  not  have  a  test  of  statistical  significance  for 
this  effect,  ^ 

iMoskingoso  Function  of  the  Frequency  Difference 
beltveen  the  Test  Stimulus  and  the  .Voisc  Mosking  Stimulus 
Band  1  is  more  sensitive  to  masking  by  noise  m  its  own  band 
than  are  frequency  bands  2.  3,  and  4,  which,  when  masking 
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themselv^,  are  all  equally  effective;  that  is,  let 
represent  the  masking  effectiveness  of  noise  band  j  on  signal 
band  i.  *1^0  points  {sjnjiin*  >  =  2, 3, 4,  are  all  at  the  same 
le\’el  in  Fig.  8;  thepoirits  ($i/ni)M%  is  much  higher. 

To  compare  band  1  with  the  other  bands,  it  is  necessary  to 
normalize  the  masking  vulnera^Iity  of  different  bands. 
Masking  vulnerability  is  indexed  by  self-masking  (St/ni)iot^ 
The  normalized  masking  effectiv  eness  NME  is 

NME  (s^Hj) « isJnj)iffJ{sJn)yn‘ 

Masking  as  a  function  of  the  frequency  separation  be- 
tf^een  test  and  noise  bands'is  illustrated  in  Fig.  9.  'The 
abscissa  is  the  ratio  fjfn  (on  a  log  scale),  where  f  represents 
the  mean  frequency  of  a  band.  The  ordinate  represents  the 
1(^  of  the  normalized  masking  effectiveness.  The  straight 
lines  represent  a  mirror-s>'mmetric  function  flU^  to  the 


Fig.  6.  Avetege  ratings  a$a  function  of  signal>to>no)se  ratio  for  the 
signal  and  the  noise  in  band  3.  The  data  are  indicated  b>  circles,  the 
three-segment  fit  is  indicated  by  the  hea\y  lines.  The  dashed  lines 
indicate  the  procedure  for  estimating  (s//i)son,  the  abscissa  value 
under  the  arrow 
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Fig  7  Rating  functions  for  cross  band  masking  The  abscissa  is 
the  signal-to  noise  ratio;  the  ordinate  is  the  mean  rating,  and  the 
curves  represent  the  three  segment  best  fits  to  the  data  Each 
panel  represents  data  from  one  signal  band  the  curve  label  indi 
cates  the  band  of  the  noise  n, 


SIGNAL  BAND 

Fig.  8.  Masking  effectiveness  of  noise  bands  against  signal  bands. 
The  abscissa  is  the  signal  band  a,;  the  ordinate  is  the  value  of  (s/n)M 
derived  from  the  rating  functions  (Fig.  7)  by  the  estimation  proce- 
dure  shonn  in  Fig.  6.  The  curve  parameter  indicates  the  noise 
band.  Emphasized  points  indicate  that  the  signal  and  the  noise  are 
in  the  same  band 


Fig  9.  Normalized  cross-band  mashing  as  a  function  of  frequency 
separation.  Each  band  is  represented  by  its  mean  frequency/  The 
abscissa  represents  the  log^  of  The  ordinate  is  the  log?  of 

the  normalized  masking  effectiveness,  the  same  data  as  m  Fig  8 
with  the  curves  for  each  signal  band  i  moved  up  so  that  {s,/n, )jo^  falls 
at  00  Signal  bands  i  and  noise  bands  ;  arc  indicated  bv  i  +  ;,  the 
center  of  the  +  indicates  the  plotted  datum  The  straight  lines 
represent  the  optimal  mirror-symmetric  fit  to  the  data,  the  lines  are 
centered  above  Iogj(/y/.)  •  0  46  and  with  a  slope  of  ±1.1 1. 

data  and  constrained  to  pass  through  0,  0.  (The  mirror- 
symmetric  fit  is  the  most  convenient  for  determining  wheth¬ 
er  there  is  any  asymmetry  between  the  masking  effective¬ 
nesses  of  low  and  high  frequencies.)  The  peak  is  located  to 
the  right  of  zero;  the  point  of  symmetry  is  x  »  log2(/j//n)  * 
0.46,  which  represents  a  frequency  ratio  for  optimal  masking 
of  1:138.  The  slopes  of  the  distance  function  are  ±1 11. 

Cross-band  masking  is  quite  adequately  described  in 
terms  of  log  frequency  separation  (log  /,  -  log  /«)  without  the 
necessity  of  referencing  the  particular  frequencies  that  con¬ 
tribute  to  the  separation  Masking  falls  off  by  a  factor  of 
slightly  more  than  2  when  the  frequency  separation  is  dou¬ 
bled,  a  result  that  is  generally  consistent  with  data  obtained 
with  much  simpler  stimuli. **  '3  *''  The  right-of-centcr  peak 
in  Fig.  9  indicates  that  noise  frequencies  lower  than  the 
signal  mask  itshghtly  better  than  do  frequencies  higher  than 


612  J.  OpL  Soc  Am.  A/Voirs,  Nd.  4/Apnl  19S8 


T.  R.  Rled!  tod  C  SperUs^ 


the  signal  This  ^>*mmetry  b  reflected  in  all  six  direct 
comparisons  of  masking  of  signal  band  i  by  noise  band  j 
compared  with  masking  of  siga'^  band  j  and  noise  band  i. 
For  i  >  j,  the  masking  effectiveness  NMEfsyn^)  >  NME 
(s/rt,).  This  masking  asymmetry  is  opposite  that  obtained 
with  data  from  simpler  stimuli.^* 

Although  masking  falls  off  with  increasing  frequency  dis¬ 
tance  between  ban^  with  sufficient  power,  any  noise  band 
can  obliterate  any  signal  band? that  is,  in  Fig.  7  all  the  rating 
functions  were  driven  to  zero  at  low  signal-to-noise  ratios. 
Our  spatial-frequency  filters  are  suffidently  narrow  that 
this  effect  cannot  ^  attributed  to  common-frequency  mask¬ 
ing,  which  occurs  when  frequencies  in  the  tail  of  the  noise 
happen  to  fall  within  the  signal  band  and  are  so  highly 
amplified  that  they  change  the  signal-to-nolse  ratio  within 
the  signal  band  itself.  Most  masking  between  widely  sepa¬ 
rated  frequencies  is  caused  by  nonlinear  distortion  in  the 
display  system  and  the  visual  system,  neither  of  which  faith¬ 
fully  reproduces  small-amplitude  variations  in  large  signals. 
Both  systems,  in  effect,  create  masking  noise  at  new  fre¬ 
quencies  when  confronted  with  high-amplitude  inputs-  In¬ 
deed,  the  two  extreme-left-hand  and  two  extreme-right- 
hand  points  in  Fig.  9  are  at  the  intensity  resolution  limit  of 
the  display  system  and  might  hav  e  showTi  less  masking  effect 
(been  lower  in  the  figure)  had  the  display  system  been  better 
able  to  render  small  signal-to-noise  ratios  faithfully.  To 
determine  whether  masking  between  widely  separated  fre¬ 
quencies  also  arises  from  genuine  channel  interactions 
would  require  bigger  interactions  than  those  observed  here. 
All  in  all,  the  cross-band-masking  data  obtained  with  our 
complex  displays  are  quite  comparable  with  data  obtained 
with  sinusoidal  gratings. 

EXPERIMENT  4:  ADDING  SIGNALS  FROM 
DIFFERENT  BANDS 

Typically,  signal  addition  has  been  studied  with  simple, 
static  signals  at  low  contrast  levels  in  which  internal  noise  is 
dominant^'^'--"^  rather  than  with  realistic  dynamic  stimuli 
at  high  contrast  levels  with  high  levels  of  external  noise. 
The  purpose  of  experiment  4  is  to  discover  quantitatively 
how  ASL  intelligibility  is  affected  when  two  dynamic  signals 
from  different  spatial-frequency  bands  are  algebraically 
added.  The  effect  on  performance  of  adding  two  ASL  sig¬ 
nals  IS  an  inherently  complex  matter  because  it  depends  on 
the  signal-to-noise  level  at  which  the  addition  is  tested 
This  dependence  is  derived  in  part  from  the  psychometric 
function  (performance  versus  s/n),  which  is  concave  up  at 
low  intensities  and  concave  down  at  high  intensities,  and  in 
part  from  more-complex  factors.  Thus,  at  high  levels  of  s/n, 
performance  cannot  be  improved  by  further  increases  in  $. 
Insofar  as  we  wish  to  characterize  the  efficiency  of  a  detector 
in  terms  of  internal  noise,  this  would  mean  that  at  high  input 
levels,  internal  noise  is  proportional  to  the  input.^* 

At  low  levels  of  s,  performance  in  detection  tasks  typically 
increases  with  the  square  of  s;  i  e.,  power-law  detection  is 
obtained.^^27  Square-law  detection  is  consistent  with  con¬ 
stant  internal  noise,  independent  of  s.  Insofar  as  the  square 
law  also  applies  to  band-limited  ASL,  doubling  the  ampli¬ 
tude  of  a  signal  m  band  i  (and  thereby  quadrupling  its 
power)  might  be  expected  to  improve  intelligibility  more 
than  would  adding  signal  in  bandy  (which  would  only  double 
signal  power). 


In  contrast,  consider  the  addition  of  two  signals  at  a  high 
level  of  sh.  Within  any  single  spatial-frequency  band  i, 
even  with  noiseless  stimuli,  performance  is  not  so  good  as  in 
the  original  unfiltered  source  images.  Therefore,  at  a  high 
signal  lev'el  in  band  i,  adding  signals  from  another  band  j  is 
more  effective  in  improving  performance  than  adding  still 
more  signal  in  band  i.  Thus  different  factors  are  critical  for 
high-intensity  and  for  low-intensity  signal  combinations, 
and  their  combinatorial  effects  are  modeled  by  different 
rules. 

Tostudy  how  weak  signals  combine,  we  need  a  method  of 
generating  approximately  equivalent  weak  signals.  Weak¬ 
ening  a  signal  by  reducing  the  signal  contrast  relies  on  the 
observer's  internal  noise  to  weaken  the  signal  Adding  ex¬ 
ternal  noise^  is  obviously  the  better  way  to  control  signal 
intelligibility.  Pavel  et  olV  showed  that  for  constant  s/n, 
the  signal  contrast  could  be  varied  over  a  wide  range  without 
affecting  intelligibility.  Indeed,  in  a  preliminary  study  (see 
Ref.  10,  Exp.  4),  this  result  was  verified  again  with  the  cur¬ 
rent  set  of  ASL  stimuli.  Thus,  to  study  how  signals  com¬ 
bine,  we  may  use  any  signals  that  fall  within  the  enormous 
range  of  contrasts  that  is  sufficient  to  overcome  internal 
noise,  and  we  vary  intelligibility  by  varying  external  added 
noise. 

Method 

Overview 

The  first  step  in  the  procedure  is  to  compose  the  spatial- 
frequency  amplitude  spectrum  of  an  external  noise  stimulus 
so  that  it  would  mask  all  signal  bands  equally.  Unfortu¬ 
nately,  the  rating  functions  in  Fig.  7  are  not  parallel  in  the 
different  signal  bands,  so  equal  masking  of  ail  spatial  bands 
at  different  intensities  is  impossible  with  a  single  noise 
source.  Given  that  limitation,  we  selected  a  particular  noise 
stimulus  to  test,  first,  the  intelligibility  of  weak  signals  in  all 
bands  i  under  ihis  noise  and,  second,  the  intelligibility  of  all 
combinations  of  signals  in  band  t  with  signals  in  band; 
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Log^  Spotiol  Frequency 

Fig  lO  Spatial  power  spectrum  of  the  composite  noise  used  m 
experiment  4  The  abscissa  is  the  logj  of  the  spatial  frequency  in 
cycles  per  picture  width  (/o.  the  width,  is  64  pixels)  The  extreme 
left  hand  side  represents  1  c>cle  per  picture,  the  extreme  right 
hand  side  represents  i2  cycles  per  picture  The  ordinate  represents 
relative  power  on  a  linear  scale 
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Fig  11  Single  framfs  illustrating  the  stimuli  fo7experim«nt  4.  The  »um  of  weak  signals  in  bands  land;  plus  the  composite  noise  of  Fig  10 
Composite  noise  IS  equall>  present  in  sil  stimuli.  Theleftniost  column  represents  single-band  signaU,  with  the  band  indicated  b>  the  number 
at  the  left  The  other  panels  represent  stimuli  composed  of  tsso  signal  bands,  one  component  band  indicated  b>  the  number  at  the  left  of  the 
row  and  the  other  band  indicat^  by  the  number  at  the  top  of  the  column. 


ComposJie  Noise 

Prom  the  cross*band-mas](ing  data  of  experiment  3,  vte  in¬ 
ferred  a  particular  composite  noise  that  ^^ouid  be  expected 
to  reduce  t^eak  signals  in  all  bands  to  approximately  equal 
intelligibilities.  (We  use  the  term  composite  noise  to  em¬ 
phasize  that  the  noise  can  be  regarded  as  being  composed  of 
many  spatiabfrequency  bands,  each  with  a  different  ampli¬ 
tude  and  with  a  different  temporabfrequency  spectrum.) 
Full  equality  of  intelligibility  may  be  impossible  with  any 
composite  noise  because  of  the  complex  cross-band  masking 
revealed  in  experiment  3.  Figure  10  shows  the  spectrum  of 
the  noise  that  was  used 

Signofs 

The  signals  were  80  ASL  signs,  basically  the  same  set  that 
was  used  in  experiment  lb.  They  were  produced  at  s,/n  » 
0  25,  where  s,  indicates  the  amplitude  of  signal  in  band  i  and 
n  indicates  the  rms  amplitude  of  the  composite  noise  stimu¬ 


lus.  All  six  combinations  of  signal  m  band  i  with  signal  band 
A  J  ^  if  were  produced  There  were  four  combinations  of 
signal  in  band  i  with  itself  (i  e  ,  s/n  q  5)  and  four  stimuli 
with  signal  in  band  <  alone  (s/n  »  0  25)  Additionally,  a 
composite  signal  was  composed  of  the  sum  of  all  four  bands 
represented  by  their  amplitude  in  the  s/n  0  25  condition 
The  composite  -ignal  was  tested  alone  (the  control  condi¬ 
tion)  and  under  the  composite  noise  (equivalent  to  s/n  « 
1.0)  The  stimulus  conditions  are  illustrated  in  Fig  11 

Procedure 

The  80  signs  were  divided  to  16  blocks  of  5  signs,  balanced 
for  difficulty.  A  Greco-Latm  square  design  was  used  to 
generate  a  completely  counterbalanced  design  in  which  ev¬ 
ery  block  of  ASL  signs  occurred  m  every  signal  condition, 
and  the  order  of  conditions  was  balanced  o\er  subjects 
This  required  generating  16  different  hour  long  stimulus 
tapes,  one  for  each  of  the  16  subjects  run  in  this  experiment 
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The  viewing  and  testing  conditions  nere  similar  to  those 
described  for  experiment  1  and  particularly  for  experiment 
lb.  Subjects  were  fluent  ASL  signers  from  the  community. 
As  before,  all  subjects  had  good  vision  under  the  experimen- 
tal  conditions  as  determined  by  an  acuity  test  administered 
before  the  experiment. 

Results 

Figure  12  shoe's  the  results  for  all  classes  of  signals  conHned 
to  a  single  band.  Ats/n  «  0.25,  intelligibility  in  all  bands  is 
below  9%.  At  s/n  -  0.5,  intelligibility  in  b^ds  1  and  2  is 
17.5%,  whereas  performance  in  bands  3  and  4  is  60.0%.  At 
s/n  -  «»,  the  conditions  run  to  test  the  filters  in  experiment 
lb,  intelligibility  rises  to  66.4%  in  the  lowest  band  and  up  to 
87.5%  in  band  3. 

Figure  13  shows  the  same  data  as  Fig.  12  plus  the  six 
additional  summation  conditions  of  band  i  wnth  bandy,  i  ^  j. 
The  points  indicated  with  circles  in  Fig.  13  arc  precisely  the 
same  s/n  =  0.5  points  as  in  Fig.  12.  Since  they  do  not  seem 
to  fall  any  differently  on  the  curves  than  do  nearby  points 
that  represent  different  bands,  it  appears  that  summation  is 
quite  similar  within  and  between  bands. 

SiQiistfcol  Analysis  of  the  Oo(a 

Thedesignofexperimenl4  involves  three  factors:  I6condi- 
tiunsx  16subjectsX  lOstimulussets.  Because  each  subject 
saw  each  stimulus  set  only  once  (and  not  once  in  each  condi* 
tionj,  only  256  of  the  4096  possible  conditions  were  run. 
Typical  analysis-of-variance  designs  are  inappropriate  for 


SPATIAL-FREQUENCY  BAND 

I'lg  12.  Data  from  experiment  4.  Intelligibility  of  band'hmited 
single-band  signals  in  composite  noise  The  abscissa  indicates  the 
bdnd  of  (he  signal,  the  ordinate  indicates  the  percent  correct  scored 
by  the  16  subjects  in  the  intelligibility  test.  The  curve  parameter 
indicates  the  signal  to-nuise  ratio  of  the  stimuli  The  curve  labeled 
»  represents  data  obtained  without  added  noise  in  experiment  1 
(with  different  subjects  and  a  slightly  different  stimulus  sct>.  On 
the  left-hand  ordinate,  the  point  81.4  indicates  intelligibility  of  the 
nuise-free  sum  signal  of  band  i  -f  band  2  -f  band  3  +  band  4,  the 
point  S|.4-fN  indicates  the  intelligibility  of  the  same  signal  plus 
noise  (s/n  ■  1) 


such  a  sparso  design, so  a  simple  linear  model  w*as  developed. 
A  subject's  score  y  for  a  set  of  five  stimulus  items  that 
constitute  a  condition  ranges  from  0  to  5  and  is  assumed  to 
be  the  sum  of  five  terms:  the  grand  mean  m,  factors  for 
condition  difficulty  c„  the  subject's  skill  Sj,  the  ASL  set 
difficulty  oa,  and  Hnally  a  term  representing  random  error 

ytjjk  ®  m  +  c,  +  S;  +  c*  + 

CondiUon  difficulty  c,-  is  estimated  by 

j*i  *-i 

that  is,  by  averaging  over  all  subjects  and  stimulus  sets  in 
which  condition  i  occurred  and  subtracting  m.  Factors  s, 
and  Ok  are  estimated  similarly.  The  variance  ^  of  the  ran¬ 
dom  error  <  is  (1/210)  where  210  represents  the  degrees 
of  freedom,  the  number  of  cells  (256)  reduced  by  the  number 
of  estimated  parameters  (1  +  15  +  15  +  15). 

The  rms  error  e  was  found  to  be  0  984.  This  is  approxi¬ 
mately  what  would  be  predicted  from  the  binomial  vanabih- 
tyof  the  data  if  the  predictions5ij>  b  c,  +  +  04  were  based 
on  a  completely  correct  model.  The  standard  error  of  the 
mean  of  the  scores  shown  in  Figs.  12  and  13  is  j:4.92%. 

Summotjon  osaF unct;on  of  Frequency  Distonce  betu  een 
Bonds 

The  amount  of  intelligibility  summation  as  a  function  of  the 
frequency  separation  between  component  signals  can  be 


SPATIAL-FREQUENCY  BAND 

Fig  13  Data  from  experiment  4  intelligibility  of  pairs  of  band 
limited  Signals  in  composite  nuise  The  ordinate,  the  abscissa,  and 
the  curves  labeled  0  25  and  •>  are  as  in  Fig  12  The  dashed  curves 
indicate  signals  composed  of  band  1  (indicated  on  abscissa)  and 
band  j  imdicated  as  the  curve  parameter)  The  open  circles  repre 
sent  data  fori  >  j,  the  middle  curve  of  Fig.  12  The  flat  diamonds 
represent  the  addition  of  nearby  signal  bands  (2  and  3),  the  tall 
diamonds  represent  the  addition  of  distant  bands  (1  and  4)  The 
pairs  indicated  by  diamonds  are  matched  for  the  strengths  of  their 
constituent  signals 
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tested  nicely  by  using  the  date  of  experiment  4.  Because,  at 
$ln  *»  0^,  bands  1  and  2  ha%*e,  by  coinddence,  exactly  the 
Mine  intelligibility  (17^%)  and  bands  3  and  4  have  the  same 
intelligibility  (60.0^),  ^  t  compare  the  intelligibility  of  band 
1  plus  band  4  (wide  separation)  with  that  of  band  2  plus  band 
3  (small  separation).  These  two  points  are  at  slightly  differ¬ 
ent  intelligibility  \t\  els  in  Fig.  13;  the  small  band  separation 
(Rat  diamonds)  at  35%  is  somevvhat  higher  than  the  la^e 
separation  (thin  diamonds)  at  25%.  The  probability  that  a 
difference  this  large  would  occur  by  chance,  estimated  by  a 
one-tail  z  test,  is  0.040. 

To  determine  whether  it  is  more  effident  to  improve  a 
weak  signal  in  band  i  by  adding  more  energy  in  i  or  do  so  by 
adding  energy  in  an  adjacent  band  y,  we  compare  the  effects 
of  summing  two  signals  ats/n  »  0-25.  In  Fig.  13,  the  cross¬ 
ings  of  the  curves  labeled  3  and  4  at  the  extreme  right  and 
the  crossings  of  the  curves  labeled  1  and  2  at  the  extreme  left 
indicate  that  there  is  a  tendency  for  the  sum  of  band  4  4- 
band  4  (7  =  60%)  and  of  band  3  +  band  3  (/  «  60%)  to  be  more 
intelligible  than  band  3  +  band  4  (7  ^  50%)  and  for  the  sums 
band  1  +  band  1  and  band  2  +  band  2  (both  7  »  17.5%)  to  be 
slightly  more  intelligible  than  band  1  4  band  2  (7  -  16  3%). 
The  probabilities  of  these  differences'  occurring  under  the 
null  hypothesis  are  0.024  and  0  209,  respectively.  Taken 
together,  these  ob$er\'atioriS  imply  that,  with  the  signal  lev¬ 
els  studied  here,  there  is  a  small  but  occasionaly  significant 
tendency  for  component  signals  to  contribute  more  to  intel¬ 
ligibility  when  they  are  closer  in  frequenc)*. 

E//iciency  When  Signol  Power  /$  Consiroined 
For  practical  purposes,  when  two  different  weak  visual  ASL 
signals  are  summed,  the  effect  of  frequency  separation  on 
intelligibility  is  small  All  the  factors  that  might  have  con¬ 
tributed  toa  separation  effect  or  an  inverse  separation  effect 
are  almost  in  balance  at  the  s/n  values  investigated  here.  To 
improve  intelligibility,  given  a  signal  in  band  {.adding  more 
signal  in  any  other  band  ;  is  almost  as  effective  as  adding 
more  signal  in  1.  In  these  signal  manipulations,  we  are 
speaking  of  signal  amplitudes  If  we  were  concerned  with 
signal  power  rather  than  with  rms  amplitude,  then  it  would 
clearly  be  more  efficient  to  distribute  the  power  over  differ¬ 
ent  bands.  Doubling  the  amplitude  within  a  band  quadru¬ 
ples  the  power,  whereas  the  power  of  signals  in  disjoint 
bands  adds  linearly. 

SUMMARY  AND  CONCLUSIONS 

(1)  In  low-resolution  dynamic  ASL  images  (96  x  64  pix¬ 
els),  It  IS  possible  to  divide  the  original  signal  into  four 
different  frequency  bands,  each  of  which  is  quite  intelligible 
(67-87%  for  isolated  ASL  signs)  and  each  of  which  could 
serve  for  ordinary  ASL  communication. 

(2'  The  empirically  determined  temporal-frequent 
spectrum  of  ASL  is  approximately  the  same  in  all  spatial- 
frequency  bands. 

(3)  The  ratio  of  root-mean-square  signal  amplitude  to 
noise  amplitude,  s/n,  at  which  ASL  becomes  intelligible  is 
nearly  the  same  for  the  three  highest  bands,  but  the  critical 
s/n  is  higher  for  the  lowest-frequency  band 

(4)  The  masking  of  signals  in  one  band  by  noise  in  anoth¬ 
er  IS  governed  simply  by  the  ratio  of  frequencies  between  the 
bands  (the  difference  of  the  log  frequencies)  There  is 
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asymmetry:  noise  lower  in  spatial  frequency  than  the  signal 
is  more  effective  in  masking  than  is  higher-frequency  spatial 
noise.  When  the  frequent  separation  between  signal  and 
noise  is  increased  by  a  factor  of  2,  intelligibility  can  be 
maintained  at  t/2  the  original  signal -to-noise  ratio. 

(5)  When  two  weak  signals  (s/n  »  0.25)  are  added,  the 
intelligibility  of  the  summed  signal  is  slightly  greater  when 
the  two  signals  are  in  adjacent  frequency  bands  than  when 
they  are  widely  separated  bands;  and  intelligibility  is  slight¬ 
ly  greater  when  the  two  signals  are  identical  than  when  they 
are  in  adjacent  bands.  If  the  signal  power— not  ampli¬ 
tude— is  limited,  intelligibility  is  maximized  by  dispersing 
the  signal  power  widely  across  frequency  bands. 

APPENDIX  A:  nLTER-GENERATION 
ALGORITHM 

This  algorithm  generates  K  filters  that  divide  frequency 
space  (to*,  and  into  partially  overlapping  annular  regions 
whose  boundaries  are  adjustable.  The  summed  output  of  all 
the  filters  equals  the  original  input  signal. 

Let  K  be  the  desired  number  of  filters.  Let  LP  represent 
the  Fourier  transform  of  a  low-pass  filter;  that  is,  1 1  LP  (uj, 
to?,)  I )  is  monotonically  decreasing  in  a?,  and  Uy.  (The  partic¬ 
ular  LP,  that  are  used  to  feed  the  algorithm  are  defined 
below.)  We  use  the  terms  center  and  surround  analogously 
to  their  use  in  composing  difference-of-Gaussian  filters;  they 
refer  to  x.y  spread  functions  of  the  filters  The  center  and 
surround  components  are  used  as  kernels  to  generate  the 
filters.  The  surround  of  filter  /f  - 1  4  1  becomes  the  center 
of  filter  K  - 1  (the  next  lower  filter  in  terms  of  frequency). 
In  the  sum  of  all  the  filters,  all  the  centers  and  surrounds 
cancel,  and  the  original  source  image  is  recovered.  The 
steps  in  the  algorithm  are  stated  m  terms  of  the  two-dimen¬ 
sional  Fourier  transforms  of  the  filters  and  their  compo¬ 
nents: 

(1)  Define  Fk,  the  highest-frequcncy  filler  The  center 
ofFj(isdefinedtobeC^  *»  1  The  surround  offk  is  defined 
in  terms  of  LPa  (see  below)  as  «  1  -  (1  -  LpA)'",then  the 
Kih  filter  is  Fa  ■*  C/c  -  5k  “  (1  -  LPk)”. 

(2)  Do  the  following  loop  7f  -  2  times  (i  ■  1, 7f  -  2)  to 
generate,  in  sequence,  the  filters  K-l,K-2,  ..,2 

(a)  Define  the  center  of  the  K  - 1  filter  as  the  surround 
of  the  previously  defined  filter:  Ck  - ,  “  5k  4  /  - 1- 

(b)  Define  the  surround  oftheK-i  filter.  Sk-;**1- 
(l“LpK-i)”  The  surround  is  a  low-pass  filter  derived  from 
a  generating  low-pass  filter  LPk-j  chosen  so  that  Sa-i  will 
have  a  lower  cutoff  frequency  than  C,  in  accordance  with  the 
desired  partition  of  frequency  space 

(c)  Defire  the  7f  -  1  filter  as  the  center  minus  the 
surround’  Fk-,  ■  Ck-,  -  Sk-,. 

(d)  Increase  1,  if  1  ^  K  ~  2,  return  to  step  (a),  other¬ 
wise,  continue  to  step  (3) 

(3)  Fi  =  1  -  ^2^  F,  »  52,  that  is,  Fj  is  the  low-pass  filter 
that  was  chosen  as  the  surround  of  F2,  it  encompasses  all  the 
residual  signal  Note  that  F, »  1 

To  begin  the  algorithm  with  Fk  =  (1  -  LPk)'^,  LPk  must 
be  defined  Let  LPk  be  a  two-dimensional  Gaussian  low- 
pass  filter  whose  frequency-domain  representation  is 


616  J.0pt.SocAm,A/VoL5.No.4/ApnlI988 


T.  R  RiedI  and  G.  Sparling 


LPk(wi,  ccy,  tf,)  =  expI-2x2(<r,W  + 

where  and  cc^  are  the  frequency  components  in  the  x  and  y 
directions,  respectively,  and  Ox  and  Oy  are  the  x  and  y  widths 
of  the  generating  spatial  Gaussians.  Since  F/f  *  (1  “  Fk)", 
as  m  increases,  the  frequency  cutoffs  bkome  steeper,  and 
the  0%  erlap  between  filters  is  fedu’i^  (which  is  good);  but  for 
m  >  4.  the  ringing  in  the  x->y>$pace  domain  becomes  obtru- 
sive  (which  b  bad).  Therefore  m  =  4  was  chosen. 
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the  mterestmg  questions  are,  the  data  speak  for  ^emselves.  But  there  is  not  a  methods  were  universal.  Wundt  went  further,  to  information  processing,  think- 

trace  of  theory.  beyond,  but  his  methods  too  often  were  introspective.  Many  of  Wundt's 

The  treatment  of  spans  in  Woodworth  and  Schlosberg  derives  directly  from  foJJowcrs  were  Jess  well  versed  in  scientific  protocol  than  he.  In  their  hands,  the 
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response  relations  at  a  very  global  level.  The  descriptive  fonnulations  of  7  :s  2  One  pailicular  fotmulation  of  the  memory  bottleneck  (Heinemann,  1984)  has 

will  eventually  be  supplemented  with  process  theories  -  models  that  embody  the  been  extensively  tested  on  pigeons  as  well  as  humans.  Chase  (1984)  and  Heine- 

step-by-step  computations  carried  out  in  the  cognitive  microprocesses  that  un-  mann  find  that  pigeons  make  absolute  judgments  of  auditoiy  intensity  that  are 

derlie  performance.  Eventually,  the  process  models  will  be  fleshed  out  with  neural  qualitatively  quite  similar  to  human  judgments..  They  model  the  limited  capacity 

components  that  represent  the  biological  structures  that  cany  out  the  cognitive  1  memory  by  assuming  that,  in  m'emoiy,  the  outcome  of  a  trial  is  represented  by  a 
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Founer  Motion  Perception.  Jnvesagative  Ophihalmology  and 
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Cjar/et  t^hAf,  aaf  Qf^rff  fi^rlirf.  NewYoAUniveniiy 

='isp  .!S=;4”S“„?S,:^ 

^al^ic  mscl^isms  and  which  cancihckss  digilay  suong.  co^^i 
Alw^l^T"wi“^  md^dent  realizatto  (Qubb  &  Spwfog 
*’’9"  **'  ‘"O  “>'«•  sa*«.  a  linear  haSoS 
filler  followed  by  a  reciifier  (absoluie  valoe.  square)  wouM  suffix  lo 
«^se  Che  moaon  infonnailon  carried  by  most  nonFburii  o  aaJi  lo 
Fo“ri«fwgy  arralysis."^  However, 

^omuate  apparently  moving  siimuli  ihat  would  require  two 

^hanism.  We  use  ^  differences  to  construct  apparently  movi^ 
stimuli  that  grossly  violate  scale  invariance:  from  afar,  they  are  seen 
a  e  «en"  ™  '’J'  ^0“™'  mechanism;  from  cU,  ^y 

TOch?n"„,T  “*  "onFonriw 
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George  Sperling  and  Thomas  R.  Riedl.  Summation  and  masking  between 
spatial  frequency  bands  in  dynamic  natural  visual  stimuli  Investigative 
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SUMMATION  AND  MASKLNG  BETWEEN  SPATUL  FREQUENCV  BANDS 
IN  D  WAMC  KATURAL  VISOALSILMUU 
Gfnrff  SnerUr.f  ft  RifdL K«* YoA Univcrstjy 

D)7a»ic  inures  of  a  s^acr  lodtvidoa]  of  Ancrican  Sipi 

(ASL)  utre  brvdpais  filtered  In  «Jjacen!  frequency  ba.nds.  LneclkplMlity  of  a 
band  uas  deiensincd  lesfing  deaf  subjects  fiuertf  In  ASt.  B>  Iterrjidy  \ao'lo{;  tbe 
center  frequeodes  and  ^nduidd'.s  of  the  spaual  bandpass  filtets.  it  uas  possible  to  dnide 
the  onfin^  dpuJ  (96  x  61  pixels)  Into  four  adjacent  intelltpible.  frcquenc>  bands  uiih 
rnean  frequencies  of  3i).  IS,  and  2S  C)'Cles  per  fi2s>e>uaddt  All  bands  uere  found  to 

has  e  the  sante  tetnporal  frequency  spectnun  up  to  a  nultl^tljcauve  eonsta.nL 

Masking  of  stools  In  band  i  by  noise  In  b3.nd  j  (4x4  condtUons)  uas  measured  b>  a 
fa6n£  method.  Ihe  pouer  rafio  ul*Jun  a  band  i.  /*«««t(r)//*«^N(0,  required  to  produce  a 
entenon  rating  response  u  as  the  same  for  bands  2«  3. 4  and  higher  for  band  1  (3  c/frame) 
The  logondun  of  the  nonnaltzed  crossband  maslang  effecbvcness  «as  rmer«l> 
proportional  to  log  l(.T/re^*,^.  )•  The  07  indicates  an  a$>TnmtBy:  Maslang  of 

lugh  frequency  signals  b>  low  frequenc)'  noise  is  slighdy  greater  the  masUng  of  low 
frequencies  by  highs 

WeaX  signals  from  bands  i  a.nd  j  uerc  linearly  added  and  tested  for  in:elligibi!ii>. 
l.r.clligitK!:iy  u-as  slightly  greater  for  signals  in  the  same  ba.*)d  (i»^)  versus  adjacera 
bands,  and  for  adjacent  bands  versus  disurt  ba.nds.  Obviously,  for  suor.i  signals,  adding 
different  bands  produces  toort'inteUigit^e  comtnnations  than  does  Increasing  poutr 
vv-tihin  a  band 

Th;  high  Intellig’bihty  achieved  by  the  narrou  bands  of  our  low  rcsoluuon  sipuh 
indicates  that  high-rcsolution  broad  spearum  signals  could  be  decomposed  into  many 
no.*)over]appuig  frequency  bands,  each  of  vkh.ch  contvned  sufficient  information  fo' 
interpreting  ASL^ 

^Supponed  in  pan  b)  AFOSR  Life  Scienees  Duectontt  Cnr.i  SM)3&t  and  NSF  Science  and 
Technolog)  to  Aid  the  Ha>vd>c3ppcd.  Cram  rnt'S0l7n89 


